What is an SFU in LiveKit?

An SFU (Selective Forwarding Unit) in LiveKit is a media server that receives encoded audio and video streams from publishers and forwards only the relevant packets to each subscriber without decoding or re-encoding the media. This keeps CPU usage low and makes LiveKit scalable to hundreds of participants.

How does LiveKit handle media routing?

LiveKit routes media using simulcast (multiple quality layers per stream), Dynacast (pausing unused quality layers), and Adaptive Stream (matching video quality to the subscriber viewport size). These work together to minimize bandwidth and server load automatically.

How does LiveKit scale across multiple servers?

LiveKit uses Redis as a distributed coordination layer between multiple server nodes. Each node publishes its room and participant state to Redis, allowing participants connected to different nodes to exchange media streams seamlessly without a central bottleneck.

What is the difference between LiveKit Egress and Ingress?

LiveKit Egress outputs room media as recordings, HLS streams, or RTMP feeds to external platforms. LiveKit Ingress brings external media sources such as RTMP streams or video files into a LiveKit room as a participant track. Both run as separate services coordinated through Redis.

Can LiveKit be used for AI voice agents?

Yes. LiveKit Agents framework allows server side processes to join rooms as first class participants. These agents can subscribe to audio tracks, run speech to text, pass text to a large language model, and publish synthesized speech back into the room, enabling fully real time AI voice assistants.

LiveKit Architecture Deep Dive: SFU, Media Routing, and Scaling

Real-time communication has become the invisible backbone of modern software. From telehealth appointments to collaborative coding environments and AI-powered voice assistants, the expectation today is zero-lag, high-fidelity audio and video at scale. Meeting that expectation without rebuilding the wheel from scratch is exactly why engineers are turning to LiveKit. But understanding what makes LiveKit so capable requires looking under the hood at its architectural philosophy: a Selective Forwarding Unit at its core, an intelligent media routing layer on top, and a scaling model built for the distributed, containerized world we live in today.

This guide is not a quickstart tutorial. It is a structured examination of how LiveKit thinks about media, how its SFU makes decisions, and how the platform handles the complexity of scaling across distributed infrastructure. Whether you are at the beginning of your livekit development journey or deepening your livekit integration for production workloads, this breakdown will give you the architectural grounding to make better decisions.

What Is an SFU and Why Does LiveKit Use One?

Before examining LiveKit specifically, it helps to understand the spectrum of architectures available for real-time media. There are three primary models: Mesh, MCU (Multipoint Control Unit), and SFU (Selective Forwarding Unit). Each makes different tradeoffs between server load, client load, latency, and video quality.

Architecture	How It Works	Server Load	Client Load	Scales Well?
Mesh (P2P)	Each peer sends directly to every other peer	Very Low	Very High	No
MCU	Server decodes, mixes, and re-encodes all streams	Extremely High	Low	No
SFU	Server routes packets selectively without re-encoding	Moderate	Moderate	Yes

In a mesh architecture, when you have 10 participants, each participant is uploading 9 separate video streams simultaneously. This destroys bandwidth and CPU on the client side. MCUs solve the upload problem by mixing streams server-side, but they require full decode-encode cycles on every packet, making them CPU-hungry and expensive to scale.

An SFU takes a different approach. It receives encoded media from each sender and forwards the right packets to the right receivers, without ever decoding or re-encoding. The server is acting more like an intelligent router than a processing unit. This is why SFUs are the architecture of choice for large-scale real-time applications, and it is precisely what LiveKit implements.

Key Concept

LiveKit’s SFU never touches the actual content of your media. It works at the packet level, forwarding RTP (Real-time Transport Protocol) packets based on routing decisions made in real time. This means CPU usage stays proportional to the number of streams being forwarded, not the quality or complexity of the video content itself.

LiveKit’s Core Architecture: Room, Track, and Participant Model

To understand how LiveKit routes media, you first need to understand its data model. Everything in LiveKit is organized around three primitives: Rooms, Participants, and Tracks.

1
Room A logical unit representing a single session. All participants in a room share the same media graph. Rooms are ephemeral and exist only while participants are connected, or until explicitly configured otherwise.
2
Participant Either a LocalParticipant (the current client) or a RemoteParticipant. Each participant has a unique identity and carries its own set of tracks. Participants can be human users or backend agents running server-side SDKs.
3
Track A single media stream — audio or video — published by a participant. Tracks can be muted, replaced, or subscribed to independently. LiveKit separates the concept of a track being published from a track being subscribed to.

When a participant publishes a video track, they are not broadcasting to every subscriber immediately. Instead, LiveKit registers the track’s metadata with the server, and subscribers receive a notification that a new track is available. Subscribing to that track is a separate, explicit step. This subscribe-based model is foundational to how the SFU selectively routes traffic.

Media Routing in Depth: How LiveKit Decides What Goes Where

The most sophisticated part of any SFU is not the forwarding itself but the routing logic. LiveKit’s routing engine makes per-packet decisions based on a combination of subscription state, network conditions, and quality layer selection.

Simulcast and Layered Encoding

When a client publishes a video track in LiveKit, it is not publishing a single bitrate stream. By default, LiveKit’s client SDKs publish using simulcast, meaning the publisher sends the same video at multiple quality levels simultaneously. A typical configuration might include a high-resolution 720p stream, a medium 360p stream, and a low-resolution 180p thumbnail stream.

The SFU then decides which layer to forward to each subscriber based on that subscriber’s available bandwidth, as estimated through REMB (Receiver Estimated Maximum Bitrate) and TWCC (Transport-wide Congestion Control) signals. This means two subscribers watching the same video track might receive different quality layers at the same moment, depending on their network conditions.

TypeScript — Publishing with custom simulcast layers

import { Room, VideoPresets, VideoPreset } from 'livekit-client';

const room = new Room({
  adaptiveStream: true,
  dynacast: true,
});

await room.connect(url, token);

// Publish camera with explicit simulcast layers
await room.localParticipant.publishCameraTrack({
  videoSimulcastLayers: [
    VideoPresets.h180,   // low quality — 320x180
    VideoPresets.h360,   // medium quality — 640x360
    VideoPresets.h720,   // high quality — 1280x720
  ],
});

Dynacast: Pausing Streams No One Is Watching

One of LiveKit’s most operationally impactful routing features is called Dynacast. In a standard SFU, publishers keep uploading all simulcast layers regardless of whether any subscriber is actually watching at that quality level. This wastes both the publisher’s upload bandwidth and the server’s forwarding capacity.

Dynacast solves this by tracking which quality layers have active subscribers. If no subscriber is receiving the high-quality 720p layer because all subscribers are on slow connections, LiveKit signals the publisher to pause encoding and sending that layer entirely. When a subscriber later needs the high-quality layer, the signal is reversed and encoding resumes. This is a closed-loop system that keeps your infrastructure lean without any manual intervention.

Adaptive Stream: Subscribing to What the Viewport Shows

Adaptive Stream is the subscriber-side complement to Dynacast. When enabled in LiveKit’s JavaScript SDK, it monitors the actual rendered size of each video element in the DOM. If a participant’s video is displayed at 160 pixels wide in a thumbnail grid, there is no point delivering a 1080p stream. Adaptive Stream automatically tells the server to forward only the quality layer that matches the visible size of that video element.

The combination of Dynacast and Adaptive Stream means LiveKit manages bandwidth holistically across both publishers and subscribers, reducing unnecessary media traffic without any application-level code.

The Signal Layer: WebSocket Control Plane

Media packets in LiveKit travel over WebRTC data channels and UDP, but all coordination happens through a WebSocket-based signal layer. This signal layer is responsible for room state synchronization, track publication and subscription events, participant join and leave events, SDP negotiation for WebRTC connection setup, ICE candidate exchange, and quality feedback signaling.

The signal server is implemented in Go and is part of LiveKit’s open-source livekit-server binary. It runs as a stateful service because active rooms and participant connections live in memory. However, LiveKit decouples this stateful signaling from the media routing, which is where the distributed architecture story gets interesting.

Go — Creating a room via LiveKit server SDK

package main

import (
    "context"
    "fmt"
    lksdk "github.com/livekit/server-sdk-go/v2"
    livekit "github.com/livekit/protocol/livekit"
)

func createRoom(host, apiKey, apiSecret string) {
    client := lksdk.NewRoomServiceClient(host, apiKey, apiSecret)

    room, err := client.CreateRoom(context.Background(), &livekit.CreateRoomRequest{
        Name:            "production-call-001",
        EmptyTimeout:    300,     // auto-close after 5 min empty
        MaxParticipants: 100,
        Metadata:        `{"tier":"enterprise"}`,
    })

    if err != nil {
        panic(err)
    }
    fmt.Printf("Room created: %s\n", room.Sid)
}

Scaling LiveKit: From Single Node to Distributed Fleet

A single LiveKit server node handles a substantial amount of traffic. In benchmarks on modern hardware, a single instance can route hundreds of simultaneous video streams. But real production systems require redundancy, geographic distribution, and the ability to handle unpredictable traffic spikes. This is where LiveKit’s distributed architecture becomes critical.

Redis as the Distributed State Layer

When you run multiple LiveKit nodes, they need to share state. Which rooms exist? Which participants are connected to which node? Which tracks are available? LiveKit uses Redis as its distributed coordination layer. All nodes publish their state to Redis and subscribe to state changes from other nodes. This means a participant connected to node A can receive tracks from a participant connected to node B, with the two SFU nodes coordinating the forwarding path through Redis.

YAML — LiveKit multi-node configuration

# livekit.yaml — distributed deployment configuration
port: 7880
rtc:
  tcp_port: 7881
  port_range_start: 50000
  port_range_end: 60000
  use_external_ip: true

redis:
  address: redis-cluster:6379
  username: livekit
  password: ${REDIS_PASSWORD}
  db: 0

keys:
  API_KEY: ${LIVEKIT_API_SECRET}

room:
  enabled_codecs:
    - mime: video/vp8
    - mime: video/h264
    - mime: audio/opus
  max_participants: 500

# Enable node selection via region labels
region: us-east-1
node_selector:
  kind: any
  sort_by: random

Geographic Distribution and Edge Routing

Latency in real-time media is directly tied to physical distance. A participant in Mumbai connecting to a LiveKit server in Virginia will experience significantly higher round-trip times than one connecting to a node in Singapore. For global livekit integration scenarios, running region-aware deployments is not optional, it is a fundamental quality requirement.

LiveKit Cloud, the managed offering, handles this automatically through a global network of SFU nodes. For self-hosted deployments, you deploy LiveKit instances per region and implement client-side logic to direct participants to the nearest node based on their IP geolocation or by exposing a latency-testing endpoint.

Multi-Region Nodes

Deploy SFU instances per region. Participants connect to the nearest node automatically.

Load Balancing

Redis-backed coordination distributes rooms across nodes without single-node bottlenecks.

TLS Everywhere

All signaling and media is encrypted. DTLS-SRTP for media, TLS for WebSocket control plane.

Congestion Control

TWCC and REMB algorithms adapt stream quality in real time to network conditions.

Egress and Recording: Extending the Media Pipeline

LiveKit’s architecture is not limited to real-time forwarding. The platform includes an Egress service that composites room media and outputs it as recordings, HLS streams, or RTMP feeds for platforms like YouTube Live or Twitch. The Egress service runs as a separate container and communicates with the LiveKit server through the same Redis coordination layer used by SFU nodes.

TypeScript — Starting a room composite recording

import { EgressClient, EncodedFileOutput } from 'livekit-server-sdk';

const egress = new EgressClient(
  process.env.LIVEKIT_URL!,
  process.env.LIVEKIT_API_KEY!,
  process.env.LIVEKIT_API_SECRET!
);

// Record the entire room as an MP4 to S3
const info = await egress.startRoomCompositeEgress(
  'production-call-001',
  {
    file: new EncodedFileOutput({
      filepath: 's3://my-bucket/recordings/{room_name}-{time}.mp4',
    }),
  },
  {
    layout: 'speaker-dark',
    customBaseUrl: 'https://my-custom-layout.example.com',
  }
);

console.log('Recording started:', info.egressId);

Ingress: Bringing External Streams Into LiveKit Rooms

The counterpart to Egress is Ingress, which allows external media sources to enter a LiveKit room as participants. An RTMP stream from OBS, a WHIP-compatible encoder, or a pre-recorded video file can be published into a room and appear as a regular participant track to all subscribers.

This is particularly powerful for hybrid scenarios where a live event broadcaster wants to appear inside a LiveKit room alongside real-time WebRTC participants. The Ingress service handles transcoding from the external format into the RTP packets the SFU understands, then routes those packets through the standard forwarding path as if they came from a native WebRTC client.

Security Architecture: Tokens and Room-Level Access Control

Every connection to a LiveKit room is authorized through a JWT (JSON Web Token) signed with your API secret. These tokens are short-lived and carry granular permissions: can the holder publish video, publish audio, subscribe to tracks, or only listen? Can they record the room? Can they remove participants?

Node.js — Generating a participant token with permissions

import { AccessToken } from 'livekit-server-sdk';

function createParticipantToken(
  roomName: string,
  participantId: string,
  isHost: boolean
): string {
  const token = new AccessToken(
    process.env.LIVEKIT_API_KEY!,
    process.env.LIVEKIT_API_SECRET!,
    {
      identity: participantId,
      ttl: '2h',
    }
  );

  token.addGrant({
    room: roomName,
    roomJoin: true,
    canPublish: true,
    canSubscribe: true,
    canPublishData: true,
    roomAdmin: isHost,         // only hosts can remove participants
    roomRecord: isHost,        // only hosts can start recordings
  });

  return token.toJwt();
}

Token validation happens entirely server-side. Clients never see your API secret, and each token is scoped to a specific room and identity. This architecture means your backend controls access, and LiveKit enforces it at the protocol level without any application-level middleware needed.

LiveKit Agents: AI Participants and the Programmable Room

One of the most compelling directions in livekit development is the Agents framework. Rather than thinking about rooms as purely human-to-human communication spaces, LiveKit allows you to write server-side processes that join rooms as first-class participants, publish audio or video tracks, subscribe to other participants’ tracks, and respond in real time.

An agent built on the LiveKit Agents Python or Node.js SDK can integrate speech-to-text, large language models, and text-to-speech pipelines in a single coherent loop. The agent subscribes to a participant’s audio track, transcribes it, sends the text to an LLM, synthesizes the response, and publishes an audio track containing the AI’s reply, all within the same room infrastructure you use for human participants. This makes livekit integration with AI systems architecturally simple compared to building separate real-time pipelines.

Python — Minimal LiveKit Agent with STT pipeline

from livekit.agents import AutoSubscribe, JobContext, WorkerOptions, cli
from livekit.agents.voice_assistant import VoiceAssistant
from livekit.plugins import deepgram, openai, silero

async def entrypoint(ctx: JobContext):
    # Connect to the room as a server-side agent participant
    await ctx.connect(auto_subscribe=AutoSubscribe.AUDIO_ONLY)

    assistant = VoiceAssistant(
        vad=silero.VAD(),                          # voice activity detection
        stt=deepgram.STT(),                        # speech to text
        llm=openai.LLM(model="gpt-4o"),           # language model
        tts=openai.TTS(voice="nova"),              # text to speech
    )

    assistant.start(ctx.room)
    await assistant.say("Hello, how can I help you today?")

if __name__ == "__main__":
    cli.run_app(WorkerOptions(entrypoint_fnc=entrypoint))

Observability and Production Monitoring

Running a real-time media infrastructure in production requires visibility beyond basic uptime checks. LiveKit exposes Prometheus-compatible metrics from each server node, covering active rooms, participant counts, track subscriptions, bitrate statistics, packet loss rates, and signal processing latency. Pairing these with Grafana dashboards gives your engineering team the ability to spot degradation before it becomes visible to end users.

Key Metrics to Monitor

Track livekit_room_participant_total for capacity planning, livekit_packet_loss_rate for network quality signals, and livekit_forwarded_rtp_total for a proxy of SFU load. Spikes in packet loss correlated with high forward rates are typically a sign that a node is approaching saturation.

Final Thoughts

LiveKit is not just a convenience wrapper around WebRTC. Its SFU architecture, Dynacast and Adaptive Stream routing algorithms, Redis-backed distributed coordination, and expanding Egress, Ingress, and Agents ecosystem represent a coherent infrastructure platform for building real-time applications that are genuinely production-ready.

The decisions LiveKit makes at the architectural level, forwarding without re-encoding, decoupling signal from media, expressing room access through short-lived JWTs, treating AI agents as first-class participants, reflect hard-won lessons from building at scale in a domain where milliseconds and dropped packets translate directly into user experience degradation.

Whether your team is starting fresh livekit development on a new voice application or deepening an existing livekit integration to serve global audiences, investing time in understanding these architectural foundations will save you from costly surprises in production and give you the vocabulary to make informed infrastructure tradeoffs as your platform grows.