WebRTC Direct Routing for Jarvis Chat

I run Jarvis on a container at home in Sydney. The chat UI connects to it through a chain: browser → wasnotwas.com (a US VPS) → autossh tunnel → home container. Works fine. But there’s an obvious absurdity: when I’m sitting on my couch — or standing 100 metres down the street — every chat message still crosses the Pacific twice. That’s ~250ms round-trip where the physics of the situation allows 5ms.

This is a design for fixing it, without breaking anything for the normal path.

The current architecture

Browser
  │ HTTPS (~250ms RTT from Sydney)
  ▼
wasnotwas.com (US VPS) / jarvis-api
  │ autossh tunnel → port 18765
  ▼
Jarvis container @ home (Sydney)
  └── term-llm serve web

The US server exists for good reasons: it holds the TLS certificate, handles Google OAuth, runs the bot challenge, and gives the home container a stable public endpoint without port-forwarding. All of that stays. What’s wasteful is routing the actual chat data through it when I’m nearby.

The fix: WebRTC as an optional transport

WebRTC was built for browser-to-browser video calls, but it’s really a general-purpose peer-to-peer transport with NAT traversal built in. The same machinery that lets two browsers exchange video streams directly can tunnel arbitrary data — including a streaming chat API.

The proposed architecture:

Browser
  │ HTTPS — signaling only (~2KB, once per session)
  ▼
wasnotwas.com / jarvis-api  ←── relay for setup only
  │
  │  ICE negotiation (STUN/TURN on wasnotwas.com)
  │
  ▼
WebRTC data channel  ←── actual chat traffic
  ▼
Jarvis container @ home
  └── term-llm serve web + pion WebRTC peer

The US server is still involved — but only for the ~2KB of JSON exchanged to set up the connection. Once the peer-to-peer channel is established, it carries all the chat data directly. When I’m in Sydney, the round trip drops from 250ms to roughly the time it takes for a UDP packet to leave my phone and reach my router.

Critically: this is additive. If ICE fails — symmetric NAT, corporate firewall, whatever — the browser falls back to the normal HTTPS path silently. Users who don’t benefit from proximity still get the same experience they always had.

How WebRTC punches through NAT

Both the browser and the home container are behind NAT. Neither can accept inbound TCP connections directly. WebRTC solves this through a protocol called ICE (Interactive Connectivity Establishment):

STUN (Session Traversal Utilities for NAT): Each peer asks a STUN server “what IP and port do I appear to come from?” The STUN server reflects the answer. Now both peers know their external addresses.

Hole punching: Both peers send UDP packets to each other’s external address simultaneously. Most NAT implementations (full cone, address-restricted cone) will let the inbound packet through because they’ve recently seen outbound traffic to that destination. The packets meet in the middle.

TURN relay: Some NAT types (symmetric NAT, common on mobile carriers) assign a different external port for each destination, which breaks hole punching. TURN is the fallback: a relay server forwards packets between peers. Traffic still goes through a server, but in this case it’s wasnotwas.com in Sydney, not the US — still a massive improvement.

Running coturn on wasnotwas.com covers both roles. STUN is essentially free (a single UDP exchange). TURN relay is low bandwidth for a text chat — a few KB per message.

The signaling layer

WebRTC needs a way for both peers to exchange their network candidates and session descriptions before the direct connection is established. This is called signaling and the spec deliberately doesn’t define how you do it — any channel that can relay a few JSON messages works.

The simplest approach here: three HTTP endpoints on jarvis-api (the Go service already running on wasnotwas.com).

POST /jarvis-api/webrtc/session — browser calls this to start. Gets back a session ID and short-lived TURN credentials (HMAC-SHA1 of a timestamp, valid for 5 minutes — the standard ephemeral pattern, no user database needed).

POST /jarvis-api/webrtc/signal — either side posts SDP offers, answers, and ICE candidates keyed to the session ID.

GET /jarvis-api/webrtc/signal?session_id=... — long-poll to receive messages from the other side. Holds for up to 10 seconds, returns immediately if something is queued. No WebSockets needed.

Storage is a single new table in the SQLite database jarvis-api already uses. Rows expire after 10 minutes.

The data channel protocol

Once ICE completes and the data channel opens, the channel replaces the HTTPS path for API calls. The channel is reliable and ordered by default (SCTP semantics, roughly TCP-equivalent). Everything is newline-delimited JSON:

Browser sends a request:

{"id": "req-uuid", "method": "POST", "path": "/v1/chat", "body": "...base64..."}

The home peer streams back tokens as they arrive:

{"id": "req-uuid", "type": "chunk", "data": "...SSE line..."}
{"id": "req-uuid", "type": "chunk", "data": "...SSE line..."}
{"id": "req-uuid", "type": "done", "status": 200}

The id field allows concurrent requests over a single channel. The home peer unpacks the request and runs it through the same internal handler that HTTP requests use — no duplication of chat logic.

The home peer: pion in term-llm

On the home side, pion handles the WebRTC stack. It’s pure Go, no CGO, well-maintained, and used in production by a number of serious projects.

The umbrella package pion/webrtc/v4 includes the full media stack — RTP, RTCP, codecs — none of which are needed for data channels. Measured against a baseline Go binary (-ldflags="-s -w"):

Approach	Binary delta
`pion/webrtc/v4` umbrella	+5.4 MB
Components only (ice/v4 + dtls/v3 + sctp + sdp/v3)	+1.3 MB

Using the component packages directly keeps the cost under 1.5 MB at the expense of more wiring code. Either way, the feature lives behind a //go:build webrtc tag — default builds include none of it.

A new --webrtc flag on term-llm serve web activates it. When enabled, term-llm polls the signaling server for incoming offers, responds with answers, and handles the resulting data channels. Each active connection runs in its own goroutine. Idle connections are cleaned up after 30 minutes.

The config:

serve:
  webrtc:
    enabled: true
    signaling_url: "https://wasnotwas.com/jarvis-api/webrtc"
    bearer_token: "<token>"
    poll_interval: 2s

The browser side

At page load, the chat JS checks a feature flag (window.__WEBRTC_ENABLED__). If set, it attempts to establish a peer connection before the user sends their first message. The setup takes 100–500ms — fast enough that it’s usually complete by the time the user finishes typing.

If the connection succeeds, subsequent API calls go through the data channel instead of HTTPS. A small ⚡ direct indicator appears in the chat header. If setup fails or times out (8 second budget), the attempt is abandoned and normal HTTPS takes over. No error, no visible change.

What doesn’t change

Everything else. Auth is unchanged — Google OAuth and JWT still gate access. The normal HTTPS path for users far from home is identical to today. Sessions, history, the API shape — all the same. term-llm without --webrtc builds and runs as before.

This is a pure addition. The only new infrastructure is coturn on wasnotwas.com and three HTTP endpoints on jarvis-api.

Rollout order

Install and configure coturn on wasnotwas.com. Verify STUN with stunclient. No app changes.
Add the signaling endpoints to jarvis-api. Deploy. Verify with curl.
Add pion peer support to term-llm behind --webrtc. Test against the signaling server.
Add the browser-side JS, feature-flagged off. Deploy to wasnotwas.com.
Enable the flag. Observe connection success rate in jarvis-api logs.

The honest caveats

Connection setup latency: ICE negotiation takes 100–500ms per session. This is a one-time cost per page load, not per message. Fine for a chat UI.

Carrier-grade NAT: Some mobile carriers use symmetric NAT, which breaks hole punching. TURN relay kicks in — traffic still routes through wasnotwas.com, but that server is in Sydney, not the US.

Binary size: Using component packages, pion adds ~1.3 MB to the binary (measured). The umbrella package adds ~5.4 MB but includes media codec support not needed here. The build tag keeps it zero-cost for default builds.

Complexity: This is non-trivial to implement correctly. The signaling layer is straightforward; the data channel protocol and the pion integration take more care. With AI doing the implementation legwork, the real cost is a focused weekend — maybe 2–3 days of human attention spread over a week. The hard part isn’t volume of code; it’s the one or two debugging sessions where ICE doesn’t converge and you’re staring at STUN packets.

The payoff is a chat interface that’s genuinely fast when I’m at home — not because the model got faster, but because the packets stopped going somewhere they never needed to go.