Skip to content

Built-in Channels

SMSChannel

SMSChannel(channel_id, *, provider=None, from_number=None)

Create an SMS transport channel.

RCSChannel

RCSChannel(channel_id, *, provider=None, fallback=True)

Create an RCS (Rich Communication Services) transport channel.

Parameters:

Name Type Description Default
channel_id str

Unique identifier for this channel.

required
provider Any

RCS provider instance (e.g., TwilioRCSProvider).

None
fallback bool

If True (default), allow SMS fallback when RCS unavailable.

True

Returns:

Type Description
TransportChannel

A TransportChannel configured for RCS messaging.

EmailChannel

EmailChannel(channel_id, *, provider=None, from_address=None)

Create an Email transport channel.

AIChannel

AIChannel(channel_id, provider, system_prompt=None, temperature=0.7, max_tokens=1024, max_context_events=50, tool_handler=None, tools=None, max_tool_rounds=200, tool_loop_timeout_seconds=300.0, tool_loop_warn_after=50, retry_policy=None, fallback_provider=None, skills=None, script_executor=None, sandbox=None, memory=None, tool_policy=None, thinking_budget=None, evict_threshold_tokens=5000, enable_planning=False)

Bases: AIStreamingMixin, AIGenerationMixin, AIToolsMixin, AIContextMixin, AIResilienceMixin, AIToolPolicyMixin, AISteeringMixin, AIEventsMixin, Channel

AI intelligence channel that generates responses using an AI provider.

tool_handler property writable

tool_handler

The current tool handler (may be wrapped by orchestration).

extra_tools property

extra_tools

All extra tools (user-provided + orchestration-injected).

on_event async

on_event(event, binding, context)

React to an event by generating an AI response.

Skips events from this channel to prevent self-loops. When the provider supports streaming or structured streaming: - With tools: uses the streaming tool loop that executes tool calls between generation rounds while yielding text deltas progressively. - Without tools: returns a plain streaming response. Otherwise falls back to the non-streaming generate path.

deliver async

deliver(event, binding, context)

Intelligence channels are not called via deliver by the router.

close async

close()

Close the channel, its provider, memory, and executors.

WebSocketChannel

WebSocketChannel(channel_id)

Bases: Channel

WebSocket transport channel with connection registry.

supports_streaming_delivery property

supports_streaming_delivery

Whether any connected client supports streaming text delivery.

register_connection

register_connection(connection_id, send_fn, *, stream_send_fn=None)

Register a WebSocket connection.

Parameters:

Name Type Description Default
connection_id str

Unique connection identifier.

required
send_fn SendFn

Callback for delivering complete events.

required
stream_send_fn StreamSendFn | None

Optional callback for delivering streaming messages. When provided, this connection receives progressive text delivery via the stream_start/stream_chunk/stream_end protocol.

None

unregister_connection

unregister_connection(connection_id)

Unregister a WebSocket connection.

deliver_stream async

deliver_stream(text_stream, event, binding, context)

Deliver a streaming text response to connected clients.

Streaming-capable connections receive stream_start, stream_chunk, and stream_end messages progressively. Non-streaming connections receive the final complete event via the regular send_fn.

VoiceChannel

VoiceChannel(channel_id, *, stt=None, tts=None, backend=None, pipeline=None, streaming=True, enable_barge_in=True, barge_in_threshold_ms=200, interruption=None, batch_mode=False, voice_map=None, max_audio_frames_per_second=None, tts_filter=None, bridge=None, recording=None)

Bases: VoiceSTTMixin, VoiceTTSMixin, VoiceHooksMixin, VoiceTurnMixin, VoicePipelineMixin, Channel

Real-time voice communication channel.

Supports three STT modes: - VAD mode (default): VAD segments speech, streaming STT during speech with batch fallback on SPEECH_END. - Continuous mode: No VAD + streaming STT provider — all audio streamed, provider handles endpointing. - Batch mode (batch_mode=True): No VAD, audio accumulates post-pipeline. Caller controls when to transcribe via :meth:flush_stt. Useful for dictation, voicemail, and audio-file transcription with offline models.

When a VoiceBackend and AudioPipelineConfig are configured, the channel: - Registers for raw audio frames from the backend via on_audio_received - Routes frames through the AudioPipeline inbound chain: [Resampler] -> [Recorder] -> [AEC] -> [AGC] -> [Denoiser] -> VAD -> [Diarization] + [DTMF] - Fires hooks based on pipeline events (speech, silence, DTMF, recording, etc.) - Transcribes speech using the STT provider - Optionally evaluates turn completion via TurnDetector - Synthesizes AI responses using TTS and streams to the client

When no pipeline is configured, the channel operates without VAD — the backend must handle speech detection externally.

backend property

backend

The voice backend (if configured).

supports_streaming_delivery property

supports_streaming_delivery

Whether this channel can accept streaming text delivery.

set_bridge_filter

set_bridge_filter(fn)

Set a synchronous filter for bridged audio frames.

The filter runs in the audio callback thread before each frame is forwarded. It receives (source_session, frame) and returns the frame (possibly modified) or None to drop it.

This is the synchronous equivalent of BEFORE_BRIDGE_AUDIO — use it for fast operations like per-session muting or gain.

Parameters:

Name Type Description Default
fn BridgeFrameFilter | None

Filter function, or None to remove.

required

set_framework

set_framework(framework)

Set the framework reference for inbound routing.

Called automatically when the channel is registered with RoomKit.

on_trace

on_trace(callback, *, protocols=None)

Register a trace observer and bridge to the backend.

resolve_trace_room

resolve_trace_room(session_id)

Resolve room_id from voice session bindings.

bind_session

bind_session(session, room_id, binding, *, backend=None)

Bind a voice session to a room for message routing.

Parameters:

Name Type Description Default
session VoiceSession

The voice session to bind.

required
room_id str

Target room ID.

required
binding ChannelBinding

Channel binding descriptor.

required
backend VoiceBackend | None

Override backend for the bridge. When bridging sessions from different transports (e.g. SIP + WebRTC), pass the session's own backend so the bridge sends audio through the correct transport.

None

connect_session async

connect_session(session, room_id, binding)

Accept a voice session via process_inbound.

Delegates to :meth:bind_session which handles pipeline activation and framework events.

disconnect_session async

disconnect_session(session, room_id)

Clean up a voice session on remote disconnect.

update_binding

update_binding(room_id, binding)

Update cached bindings for all sessions in a room.

Called by the framework after mute/unmute/set_access so the audio gate in _on_audio_received sees the new state.

add_media_tap

add_media_tap(callback)

Register a tap on processed inbound audio frames (for room recording).

Delegates to the pipeline's on_processed_frame callback list.

add_outbound_media_tap

add_outbound_media_tap(callback)

Register a tap on outbound TTS audio (for room recording).

The callback receives (session, pcm_data, sample_rate) for every outbound chunk after pipeline processing.

unbind_session

unbind_session(session)

Remove session binding.

update_voice_map

update_voice_map(entries)

Merge entries into the per-agent voice map.

Called by :meth:ConversationPipeline.install to auto-wire voice IDs from :class:Agent instances.

send_dtmf

send_dtmf(session, digit, duration_ms=160)

Send a DTMF digit to the remote party via the voice backend.

The digit is sent as an RFC 4733 telephone-event (out-of-band). Requires a backend with DTMF_SIGNALING capability (SIP, RTP).

Parameters:

Name Type Description Default
session VoiceSession

The active voice session.

required
digit str

DTMF digit ('0'-'9', '*', '#', 'A'-'D').

required
duration_ms int

Tone duration in milliseconds (default 160).

160

Raises:

Type Description
RuntimeError

If no backend is configured or session is ended.

ValueError

If digit or duration_ms is invalid.

interrupt async

interrupt(session, *, reason='explicit')

Interrupt ongoing TTS playback for a session.

interrupt_all async

interrupt_all(room_id, *, reason='task_delivery')

Interrupt all active TTS playback in a room.

Returns:

Type Description
int

Number of sessions that were interrupted.

wait_playback_done async

wait_playback_done(room_id, timeout=15.0)

Wait until active TTS playback finishes for all sessions in room_id.

Returns immediately if no playback is in progress. Uses per-session events that are set when send_audio() returns (before the echo drain delay), so callers don't wait for the 2-second drain window.

RealtimeVoiceChannel

RealtimeVoiceChannel(channel_id, *, provider, transport, system_prompt=None, voice=None, tools=None, temperature=None, input_sample_rate=16000, output_sample_rate=24000, transport_sample_rate=None, emit_transcription_events=True, tool_handler=None, mute_on_tool_call=False, tool_result_max_length=16384, pipeline=None, recording=None, skills=None, script_executor=None)

Bases: RealtimeToolsMixin, RealtimeTranscriptionMixin, RealtimeSpeechMixin, RealtimeAudioMixin, RealtimeResponseMixin, VoicePipelineMixin, Channel

Real-time voice channel using speech-to-speech AI providers.

Wraps APIs like OpenAI Realtime and Gemini Live as a first-class RoomKit channel. Audio flows directly between the user's browser and the provider; transcriptions are emitted into the Room so other channels (supervisor dashboards, logging) see the conversation.

Category is TRANSPORT so that: - on_event() receives broadcasts (for text injection from supervisors) - deliver() is called but returns empty (customer is on voice)

Example

from roomkit.voice.realtime.mock import MockRealtimeProvider, MockRealtimeTransport

provider = MockRealtimeProvider() transport = MockRealtimeTransport()

channel = RealtimeVoiceChannel( "realtime-1", provider=provider, transport=transport, system_prompt="You are a helpful agent.", ) kit.register_channel(channel)

Initialize realtime voice channel.

Parameters:

Name Type Description Default
channel_id str

Unique channel identifier.

required
provider RealtimeVoiceProvider

The realtime voice provider (OpenAI, Gemini, etc.).

required
transport VoiceBackend

The audio transport (WebSocket, etc.).

required
system_prompt str | None

Default system prompt for the AI.

None
voice str | None

Default voice ID for audio output.

None
tools list[dict[str, Any] | Any] | None

Tool definitions as dicts, or Tool objects with .definition and .handler. Tool objects have their handlers extracted and composed automatically.

None
temperature float | None

Default sampling temperature.

None
input_sample_rate int

Default input audio sample rate (Hz).

16000
output_sample_rate int

Default output audio sample rate (Hz).

24000
transport_sample_rate int | None

Sample rate of audio from the transport (Hz). When set and different from provider rates, enables automatic resampling. When None (default), no resampling is performed — backwards compatible with WebSocket transports.

None
emit_transcription_events bool

If True, emit final transcriptions as RoomEvents so other channels see them.

True
tool_handler ToolHandler | None

Async callable to execute tool calls. Signature: async (name, arguments) -> str. If not set, falls back to handlers extracted from Tool objects, or ON_TOOL_CALL hooks.

None
mute_on_tool_call bool

If True, mute the transport microphone during tool execution to prevent barge-in that causes providers (e.g. Gemini) to silently drop the tool result. Defaults to False — use set_access() for fine-grained control.

False
tool_result_max_length int

Maximum character length of tool results before truncation. Large results (e.g. SVG payloads) can overflow the provider's context window. Defaults to 16384.

16384
pipeline AudioPipelineConfig | None

Optional AudioPipelineConfig for local audio processing (AEC, VAD, denoiser, etc.). When set, mic audio is processed through the pipeline before being forwarded to the provider, and pipeline VAD drives speech detection instead of server-side VAD.

None
recording Any | None

Optional ChannelRecordingConfig to enable room-level audio recording from this channel. Records both input (mic) and output (AI) audio tracks.

None
skills SkillRegistry | None

Optional SkillRegistry with discovered skills. When provided, skill infrastructure tools are injected and the skills preamble is appended to the system prompt.

None
script_executor ScriptExecutor | None

Optional ScriptExecutor for running skill scripts. Ignored when skills is None.

None

provider property

provider

The underlying realtime voice provider.

session_rooms property

session_rooms

Mapping of session_id to room_id.

tool_handler property writable

tool_handler

The current tool handler for realtime tool calls.

get_room_sessions

get_room_sessions(room_id)

Get all active sessions for a room.

wait_idle async

wait_idle(room_id, timeout=15.0)

Wait until all sessions in the room are idle (not speaking).

An idle session has finished its last response and all audio has been forwarded to the transport.

set_framework

set_framework(framework)

Set the framework reference for event routing.

Called automatically when the channel is registered with RoomKit.

on_trace

on_trace(callback, *, protocols=None)

Register a trace observer and bridge to the transport.

resolve_trace_room

resolve_trace_room(session_id)

Resolve room_id from realtime session mappings.

configure

configure(*, system_prompt=None, voice=None, tools=None)

Update channel defaults for future sessions.

Active sessions are not affected — use reconfigure_session for those.

inject_text async

inject_text(session, text, *, role='user', silent=False)

Inject a text turn into the provider session.

Parameters:

Name Type Description Default
session VoiceSession

The active voice session.

required
text str

Text to inject.

required
role str

Role for the text ('user' or 'system').

'user'
silent bool

If True, add to conversation context without requesting a response. The agent sees the text on its next turn but does not react immediately.

False

start_session async

start_session(room_id, participant_id, connection, *, metadata=None)

Start a new realtime voice session.

Connects both the transport (client audio) and the provider (AI service), then fires a framework event.

Parameters:

Name Type Description Default
room_id str

The room to join.

required
participant_id str

The participant's ID.

required
connection Any

Protocol-specific connection (e.g. WebSocket).

required
metadata dict[str, Any] | None

Optional session metadata. May include overrides for system_prompt, voice, tools, temperature.

None

Returns:

Type Description
VoiceSession

The created VoiceSession.

end_session async

end_session(session)

End a realtime voice session.

Disconnects both provider and transport, fires framework event.

Parameters:

Name Type Description Default
session VoiceSession

The session to end.

required

reconfigure_session async

reconfigure_session(session, *, system_prompt=None, voice=None, tools=None, temperature=None, provider_config=None)

Reconfigure an active session with new agent parameters.

Used during agent handoff to switch the AI personality, voice, and tools. Providers with session resumption (e.g. Gemini Live) preserve conversation history across the reconfiguration.

Parameters:

Name Type Description Default
session VoiceSession

The active session to reconfigure.

required
system_prompt str | None

New system instructions for the AI.

None
voice str | None

New voice ID for audio output.

None
tools list[dict[str, Any]] | None

New tool/function definitions.

None
temperature float | None

New sampling temperature.

None
provider_config dict[str, Any] | None

Provider-specific configuration overrides.

None

connect_session async

connect_session(session, room_id, binding)

Accept a realtime voice session via process_inbound.

Delegates to :meth:start_session which handles provider/transport connection, resampling, and framework events.

disconnect_session async

disconnect_session(session, room_id)

Clean up realtime sessions on remote disconnect.

update_binding

update_binding(room_id, binding)

Update cached bindings for all sessions in a room.

Called by the framework after mute/unmute/set_access so the audio gate in _pipeline_on_audio_received (pipeline path) or _forward_client_audio (direct path) sees the new state.

handle_inbound async

handle_inbound(message, context)

Not used directly — audio flows via start_session.

on_event async

on_event(event, binding, context)

React to events from other channels — TEXT INJECTION.

When a supervisor or other channel sends a message, extract the text and inject it into the provider session so the AI incorporates it. Skips events from this channel (self-loop prevention).

deliver async

deliver(event, binding, context)

No-op delivery — customer is on voice, can't see text.

close async

close()

End all sessions and close provider + transport.

WhatsAppChannel

WhatsAppChannel(channel_id, *, provider=None)

Create a WhatsApp transport channel.

MessengerChannel

MessengerChannel(channel_id, *, provider=None)

Create a Facebook Messenger transport channel.

TeamsChannel

TeamsChannel(channel_id, *, provider=None)

Create a Microsoft Teams transport channel.

HTTPChannel

HTTPChannel(channel_id, *, provider=None)

Create an HTTP webhook transport channel.

TelegramChannel

TelegramChannel(channel_id, *, provider=None)

Create a Telegram Bot transport channel.

WhatsAppPersonalChannel

WhatsAppPersonalChannel(channel_id, *, provider=None)

Create a WhatsApp Personal transport channel (neonize).

TransportChannel

TransportChannel(channel_id, channel_type, *, provider=None, capabilities=None, recipient_key='recipient_id', defaults=None)

Bases: Channel

Generic transport channel driven by configuration rather than subclassing.

All transport channels (SMS, Email, WhatsApp, Messenger, HTTP) share the same inbound/deliver logic. The only differences are data: which ChannelType, which ChannelCapabilities, which metadata key holds the recipient address, and which extra kwargs to pass to the provider's send() method.

Use the factory functions (SMSChannel, EmailChannel, …) in roomkit.channels for convenient construction.

Initialise a transport channel.

Parameters:

Name Type Description Default
channel_id str

Unique identifier for this channel instance.

required
channel_type ChannelType

The channel type (SMS, email, etc.).

required
provider Any

Provider that handles external delivery (e.g. ElasticEmailProvider).

None
capabilities ChannelCapabilities | None

Media and feature capabilities for this channel.

None
recipient_key str

Binding metadata key that holds the recipient address.

'recipient_id'
defaults dict[str, Any] | None

Default kwargs passed to provider.send(). If a default value is None, the actual value is read from the binding metadata at delivery time.

None

info property

info

Return non-None default values as channel info metadata.

capabilities

capabilities()

Return the channel's media and feature capabilities.

handle_inbound async

handle_inbound(message, context)

Convert an inbound message into a room event.

deliver async

deliver(event, binding, context)

Deliver an event to the external recipient via the provider.

The recipient address is read from binding.metadata[recipient_key]. Extra kwargs are built from defaults: fixed values are passed as-is, None defaults are resolved from binding metadata at delivery time.

WebSocket Streaming

StreamStart

Bases: BaseModel

Sent when a streaming response begins.

StreamChunk

Bases: BaseModel

Sent for each text delta during streaming.

StreamEnd

Bases: BaseModel

Sent when a streaming response completes.

StreamMessage module-attribute

StreamMessage = StreamStart | StreamChunk | StreamEnd | StreamError

StreamSendFn module-attribute

StreamSendFn = Callable[[str, StreamMessage], Coroutine[Any, Any, None]]