Skip to content

Room-Level Media Recording

Muxes audio and video from multiple channels in a room into a single MP4 file. Unlike channel-level recorders (WAV, PyAV video) that capture a single stream, room-level recording combines all media tracks into one output — the production path for recording conversations with both voice and video.

Installation

pip install roomkit[video]          # av + numpy (PyAV muxer)
pip install roomkit[local-audio]    # sounddevice (mic capture)
pip install roomkit[local-video]    # opencv (webcam capture)

Quick start

from roomkit import RoomKit, VideoChannel, VoiceChannel
from roomkit.recorder import MediaRecordingConfig
from roomkit.recorder import RoomRecorderBinding
from roomkit.recorder.pyav import PyAVMediaRecorder

# 1. Create recorder + config
recorder = PyAVMediaRecorder()
config = MediaRecordingConfig(storage="./recordings", video_codec="auto")

# 2. Create channels — recording is automatic when the room has recorders
voice = VoiceChannel("voice", backend=audio_backend, pipeline=pipeline)
video = VideoChannel("video", backend=video_backend)

# 3. Create room with recorder binding
room = await kit.create_room(
    room_id="my-room",
    recorders=[RoomRecorderBinding(recorder=recorder, config=config, name="main")],
)

# 4. Join participants — recording starts automatically
# Previously connect_voice() / connect_video(), now unified as join()
voice_session = await kit.join(room.id, "voice", participant_id="user-1")
video_session = await kit.join(room.id, "video", participant_id="user-1")

Recording starts when all registered tracks (audio + video) have received their first frame. It stops when the room is closed or close_room() is called.

Recording layers

RoomKit has three independent recording layers:

Layer Recorder Purpose Output
Audio pipeline WavFileRecorder Debug raw audio .wav per session
Video pipeline PyAVVideoRecorder Debug raw video .mp4 per session
Room MediaRecorder Production A/V Single .mp4 per room

All three can run simultaneously without interference.

Configuration

MediaRecordingConfig

Controls the output file format and encoding:

from roomkit.recorder import MediaRecordingConfig

config = MediaRecordingConfig(
    storage="./recordings",    # Output directory (created automatically)
    video_codec="auto",        # auto, libx264, h264_nvenc, libx265
    video_fps=30,              # Stream frame rate (PTS resolution)
    audio_codec="aac",         # Audio codec
    audio_sample_rate=16000,   # Audio sample rate (Hz)
    format="mp4",              # Container format
)
Field Default Description
storage ./recordings Output directory path
video_codec auto Tries NVENC first, falls back to libx264
video_fps 30 Video stream rate for PTS resolution
audio_codec aac Audio codec (AAC recommended for MP4)
audio_sample_rate 16000 Audio sample rate in Hz
format mp4 Container format

ChannelRecordingConfig

When a room has recorders, all channels record automatically — no per-channel configuration is needed. ChannelRecordingConfig is only required to opt out of recording specific media types on a channel:

from roomkit.recorder import ChannelRecordingConfig

# Exclude video from this channel (audio still recorded)
voice = VoiceChannel("voice", ..., recording=ChannelRecordingConfig(video=False))

# Exclude screen share from recording
video = VideoChannel("video", ..., recording=ChannelRecordingConfig(screen_share=False))

A/V sync

Audio and video PTS are both derived from time.monotonic() at frame acquisition time, referenced to a shared origin set after all codec streams are initialized. This ensures:

  • Audio and video stay aligned regardless of pipeline latency differences
  • Playback speed matches real time regardless of configured FPS vs actual capture rate
  • NVENC initialization delay (which can block 200-500ms) doesn't cause offset between tracks

Custom recorder

Implement the MediaRecorder ABC to write to a custom backend (cloud storage, streaming server, etc.):

from roomkit.recorder.base import (
    MediaRecorder,
    MediaRecordingConfig,
    MediaRecordingHandle,
    MediaRecordingResult,
    RecordingTrack,
)

class MyCloudRecorder(MediaRecorder):
    @property
    def name(self) -> str:
        return "cloud"

    def on_recording_start(self, config: MediaRecordingConfig) -> MediaRecordingHandle:
        # Initialize upload session
        ...

    def on_recording_stop(self, handle: MediaRecordingHandle) -> MediaRecordingResult:
        # Finalize and return URL
        ...

    def on_track_added(self, handle: MediaRecordingHandle, track: RecordingTrack) -> None:
        ...

    def on_track_removed(self, handle: MediaRecordingHandle, track: RecordingTrack) -> None:
        ...

    def on_data(self, handle, track, data: bytes, timestamp_ms: float | None) -> None:
        # Stream audio/video chunks to cloud
        ...

File naming

Output files are named room_{handle_id}_{timestamp}.mp4 where handle_id is a random 12-character hex string and timestamp is YYYYMMDDTHHMMSS in UTC.

Testing

Use MockMediaRecorder for tests — it stores tracks and data chunks in memory:

from roomkit.recorder import MockMediaRecorder

recorder = MockMediaRecorder()
# ... run test ...
assert len(recorder.tracks) == 2      # audio + video
assert len(recorder.chunks) > 0       # data was received
assert recorder.results[0].size_bytes > 0

Example

See examples/room_media_recorder.py for a complete runnable example with mic + webcam recording.

uv run python examples/room_media_recorder.py
uv run python examples/room_media_recorder.py --duration 10 --fps 30
uv run python examples/room_media_recorder.py --output ./my_recordings --device 0