This guide is not a quickstart tutorial. It is a structured examination of how LiveKit thinks about media, how its SFU makes decisions, and how the platform handles the complexity of scaling across distributed infrastructure. Whether you are at the beginning of your livekit development journey or deepening your livekit integration for production workloads, this breakdown will give you the architectural grounding to make better decisions.
What Is an SFU and Why Does LiveKit Use One?
Before examining LiveKit specifically, it helps to understand the spectrum of architectures available for real-time media. There are three primary models: Mesh, MCU (Multipoint Control Unit), and SFU (Selective Forwarding Unit). Each makes different tradeoffs between server load, client load, latency, and video quality.
| Architecture | How It Works | Server Load | Client Load | Scales Well? |
|---|---|---|---|---|
| Mesh (P2P) | Each peer sends directly to every other peer | Very Low | Very High | No |
| MCU | Server decodes, mixes, and re-encodes all streams | Extremely High | Low | No |
| SFU | Server routes packets selectively without re-encoding | Moderate | Moderate | Yes |
In a mesh architecture, when you have 10 participants, each participant is uploading 9 separate video streams simultaneously. This destroys bandwidth and CPU on the client side. MCUs solve the upload problem by mixing streams server-side, but they require full decode-encode cycles on every packet, making them CPU-hungry and expensive to scale.
An SFU takes a different approach. It receives encoded media from each sender and forwards the right packets to the right receivers, without ever decoding or re-encoding. The server is acting more like an intelligent router than a processing unit. This is why SFUs are the architecture of choice for large-scale real-time applications, and it is precisely what LiveKit implements.
LiveKit’s SFU never touches the actual content of your media. It works at the packet level, forwarding RTP (Real-time Transport Protocol) packets based on routing decisions made in real time. This means CPU usage stays proportional to the number of streams being forwarded, not the quality or complexity of the video content itself.
LiveKit’s Core Architecture: Room, Track, and Participant Model
To understand how LiveKit routes media, you first need to understand its data model. Everything in LiveKit is organized around three primitives: Rooms, Participants, and Tracks.
- 1Room A logical unit representing a single session. All participants in a room share the same media graph. Rooms are ephemeral and exist only while participants are connected, or until explicitly configured otherwise.
- 2Participant Either a LocalParticipant (the current client) or a RemoteParticipant. Each participant has a unique identity and carries its own set of tracks. Participants can be human users or backend agents running server-side SDKs.
- 3Track A single media stream — audio or video — published by a participant. Tracks can be muted, replaced, or subscribed to independently. LiveKit separates the concept of a track being published from a track being subscribed to.
When a participant publishes a video track, they are not broadcasting to every subscriber immediately. Instead, LiveKit registers the track’s metadata with the server, and subscribers receive a notification that a new track is available. Subscribing to that track is a separate, explicit step. This subscribe-based model is foundational to how the SFU selectively routes traffic.
Media Routing in Depth: How LiveKit Decides What Goes Where
The most sophisticated part of any SFU is not the forwarding itself but the routing logic. LiveKit’s routing engine makes per-packet decisions based on a combination of subscription state, network conditions, and quality layer selection.
Simulcast and Layered Encoding
When a client publishes a video track in LiveKit, it is not publishing a single bitrate stream. By default, LiveKit’s client SDKs publish using simulcast, meaning the publisher sends the same video at multiple quality levels simultaneously. A typical configuration might include a high-resolution 720p stream, a medium 360p stream, and a low-resolution 180p thumbnail stream.
The SFU then decides which layer to forward to each subscriber based on that subscriber’s available bandwidth, as estimated through REMB (Receiver Estimated Maximum Bitrate) and TWCC (Transport-wide Congestion Control) signals. This means two subscribers watching the same video track might receive different quality layers at the same moment, depending on their network conditions.
import { Room, VideoPresets, VideoPreset } from 'livekit-client'; const room = new Room({ adaptiveStream: true, dynacast: true, }); await room.connect(url, token); // Publish camera with explicit simulcast layers await room.localParticipant.publishCameraTrack({ videoSimulcastLayers: [ VideoPresets.h180, // low quality — 320x180 VideoPresets.h360, // medium quality — 640x360 VideoPresets.h720, // high quality — 1280x720 ], });
Dynacast: Pausing Streams No One Is Watching
One of LiveKit’s most operationally impactful routing features is called Dynacast. In a standard SFU, publishers keep uploading all simulcast layers regardless of whether any subscriber is actually watching at that quality level. This wastes both the publisher’s upload bandwidth and the server’s forwarding capacity.
Dynacast solves this by tracking which quality layers have active subscribers. If no subscriber is receiving the high-quality 720p layer because all subscribers are on slow connections, LiveKit signals the publisher to pause encoding and sending that layer entirely. When a subscriber later needs the high-quality layer, the signal is reversed and encoding resumes. This is a closed-loop system that keeps your infrastructure lean without any manual intervention.
Adaptive Stream: Subscribing to What the Viewport Shows
Adaptive Stream is the subscriber-side complement to Dynacast. When enabled in LiveKit’s JavaScript SDK, it monitors the actual rendered size of each video element in the DOM. If a participant’s video is displayed at 160 pixels wide in a thumbnail grid, there is no point delivering a 1080p stream. Adaptive Stream automatically tells the server to forward only the quality layer that matches the visible size of that video element.
The combination of Dynacast and Adaptive Stream means LiveKit manages bandwidth holistically across both publishers and subscribers, reducing unnecessary media traffic without any application-level code.
The Signal Layer: WebSocket Control Plane
Media packets in LiveKit travel over WebRTC data channels and UDP, but all coordination happens through a WebSocket-based signal layer. This signal layer is responsible for room state synchronization, track publication and subscription events, participant join and leave events, SDP negotiation for WebRTC connection setup, ICE candidate exchange, and quality feedback signaling.
The signal server is implemented in Go and is part of LiveKit’s open-source livekit-server binary. It runs as a stateful service because active rooms and participant connections live in memory. However, LiveKit decouples this stateful signaling from the media routing, which is where the distributed architecture story gets interesting.
package main import ( "context" "fmt" lksdk "github.com/livekit/server-sdk-go/v2" livekit "github.com/livekit/protocol/livekit" ) func createRoom(host, apiKey, apiSecret string) { client := lksdk.NewRoomServiceClient(host, apiKey, apiSecret) room, err := client.CreateRoom(context.Background(), &livekit.CreateRoomRequest{ Name: "production-call-001", EmptyTimeout: 300, // auto-close after 5 min empty MaxParticipants: 100, Metadata: `{"tier":"enterprise"}`, }) if err != nil { panic(err) } fmt.Printf("Room created: %s\n", room.Sid) }
Scaling LiveKit: From Single Node to Distributed Fleet
A single LiveKit server node handles a substantial amount of traffic. In benchmarks on modern hardware, a single instance can route hundreds of simultaneous video streams. But real production systems require redundancy, geographic distribution, and the ability to handle unpredictable traffic spikes. This is where LiveKit’s distributed architecture becomes critical.
Redis as the Distributed State Layer
When you run multiple LiveKit nodes, they need to share state. Which rooms exist? Which participants are connected to which node? Which tracks are available? LiveKit uses Redis as its distributed coordination layer. All nodes publish their state to Redis and subscribe to state changes from other nodes. This means a participant connected to node A can receive tracks from a participant connected to node B, with the two SFU nodes coordinating the forwarding path through Redis.
# livekit.yaml — distributed deployment configuration port: 7880 rtc: tcp_port: 7881 port_range_start: 50000 port_range_end: 60000 use_external_ip: true redis: address: redis-cluster:6379 username: livekit password: ${REDIS_PASSWORD} db: 0 keys: API_KEY: ${LIVEKIT_API_SECRET} room: enabled_codecs: - mime: video/vp8 - mime: video/h264 - mime: audio/opus max_participants: 500 # Enable node selection via region labels region: us-east-1 node_selector: kind: any sort_by: random
Geographic Distribution and Edge Routing
Latency in real-time media is directly tied to physical distance. A participant in Mumbai connecting to a LiveKit server in Virginia will experience significantly higher round-trip times than one connecting to a node in Singapore. For global livekit integration scenarios, running region-aware deployments is not optional, it is a fundamental quality requirement.
LiveKit Cloud, the managed offering, handles this automatically through a global network of SFU nodes. For self-hosted deployments, you deploy LiveKit instances per region and implement client-side logic to direct participants to the nearest node based on their IP geolocation or by exposing a latency-testing endpoint.
Multi-Region Nodes
Deploy SFU instances per region. Participants connect to the nearest node automatically.
Load Balancing
Redis-backed coordination distributes rooms across nodes without single-node bottlenecks.
TLS Everywhere
All signaling and media is encrypted. DTLS-SRTP for media, TLS for WebSocket control plane.
Congestion Control
TWCC and REMB algorithms adapt stream quality in real time to network conditions.
Egress and Recording: Extending the Media Pipeline
LiveKit’s architecture is not limited to real-time forwarding. The platform includes an Egress service that composites room media and outputs it as recordings, HLS streams, or RTMP feeds for platforms like YouTube Live or Twitch. The Egress service runs as a separate container and communicates with the LiveKit server through the same Redis coordination layer used by SFU nodes.
import { EgressClient, EncodedFileOutput } from 'livekit-server-sdk'; const egress = new EgressClient( process.env.LIVEKIT_URL!, process.env.LIVEKIT_API_KEY!, process.env.LIVEKIT_API_SECRET! ); // Record the entire room as an MP4 to S3 const info = await egress.startRoomCompositeEgress( 'production-call-001', { file: new EncodedFileOutput({ filepath: 's3://my-bucket/recordings/{room_name}-{time}.mp4', }), }, { layout: 'speaker-dark', customBaseUrl: 'https://my-custom-layout.example.com', } ); console.log('Recording started:', info.egressId);
Ingress: Bringing External Streams Into LiveKit Rooms
The counterpart to Egress is Ingress, which allows external media sources to enter a LiveKit room as participants. An RTMP stream from OBS, a WHIP-compatible encoder, or a pre-recorded video file can be published into a room and appear as a regular participant track to all subscribers.
This is particularly powerful for hybrid scenarios where a live event broadcaster wants to appear inside a LiveKit room alongside real-time WebRTC participants. The Ingress service handles transcoding from the external format into the RTP packets the SFU understands, then routes those packets through the standard forwarding path as if they came from a native WebRTC client.
Security Architecture: Tokens and Room-Level Access Control
Every connection to a LiveKit room is authorized through a JWT (JSON Web Token) signed with your API secret. These tokens are short-lived and carry granular permissions: can the holder publish video, publish audio, subscribe to tracks, or only listen? Can they record the room? Can they remove participants?
import { AccessToken } from 'livekit-server-sdk'; function createParticipantToken( roomName: string, participantId: string, isHost: boolean ): string { const token = new AccessToken( process.env.LIVEKIT_API_KEY!, process.env.LIVEKIT_API_SECRET!, { identity: participantId, ttl: '2h', } ); token.addGrant({ room: roomName, roomJoin: true, canPublish: true, canSubscribe: true, canPublishData: true, roomAdmin: isHost, // only hosts can remove participants roomRecord: isHost, // only hosts can start recordings }); return token.toJwt(); }
Token validation happens entirely server-side. Clients never see your API secret, and each token is scoped to a specific room and identity. This architecture means your backend controls access, and LiveKit enforces it at the protocol level without any application-level middleware needed.
LiveKit Agents: AI Participants and the Programmable Room
One of the most compelling directions in livekit development is the Agents framework. Rather than thinking about rooms as purely human-to-human communication spaces, LiveKit allows you to write server-side processes that join rooms as first-class participants, publish audio or video tracks, subscribe to other participants’ tracks, and respond in real time.
An agent built on the LiveKit Agents Python or Node.js SDK can integrate speech-to-text, large language models, and text-to-speech pipelines in a single coherent loop. The agent subscribes to a participant’s audio track, transcribes it, sends the text to an LLM, synthesizes the response, and publishes an audio track containing the AI’s reply, all within the same room infrastructure you use for human participants. This makes livekit integration with AI systems architecturally simple compared to building separate real-time pipelines.
from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli from livekit.agents.voice_assistant import VoiceAssistant from livekit.plugins import deepgram, openai, silero async def entrypoint(ctx: JobContext): # Connect to the room as a server-side agent participant await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY) assistant = VoiceAssistant( vad=silero.VAD(), # voice activity detection stt=deepgram.STT(), # speech to text llm=openai.LLM(model="gpt-4o"), # language model tts=openai.TTS(voice="nova"), # text to speech ) assistant.start(ctx.room) await assistant.say("Hello, how can I help you today?") if __name__ == "__main__": cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))
Observability and Production Monitoring
Running a real-time media infrastructure in production requires visibility beyond basic uptime checks. LiveKit exposes Prometheus-compatible metrics from each server node, covering active rooms, participant counts, track subscriptions, bitrate statistics, packet loss rates, and signal processing latency. Pairing these with Grafana dashboards gives your engineering team the ability to spot degradation before it becomes visible to end users.
Track livekit_room_participant_total for capacity planning, livekit_packet_loss_rate for network quality signals, and livekit_forwarded_rtp_total for a proxy of SFU load. Spikes in packet loss correlated with high forward rates are typically a sign that a node is approaching saturation.
Final Thoughts
LiveKit is not just a convenience wrapper around WebRTC. Its SFU architecture, Dynacast and Adaptive Stream routing algorithms, Redis-backed distributed coordination, and expanding Egress, Ingress, and Agents ecosystem represent a coherent infrastructure platform for building real-time applications that are genuinely production-ready.
The decisions LiveKit makes at the architectural level, forwarding without re-encoding, decoupling signal from media, expressing room access through short-lived JWTs, treating AI agents as first-class participants, reflect hard-won lessons from building at scale in a domain where milliseconds and dropped packets translate directly into user experience degradation.
Whether your team is starting fresh livekit development on a new voice application or deepening an existing livekit integration to serve global audiences, investing time in understanding these architectural foundations will save you from costly surprises in production and give you the vocabulary to make informed infrastructure tradeoffs as your platform grows.
