Built-in Channels¶
SMSChannel ¶
Create an SMS transport channel.
RCSChannel ¶
Create an RCS (Rich Communication Services) transport channel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
channel_id
|
str
|
Unique identifier for this channel. |
required |
provider
|
Any
|
RCS provider instance (e.g., TwilioRCSProvider). |
None
|
fallback
|
bool
|
If True (default), allow SMS fallback when RCS unavailable. |
True
|
Returns:
| Type | Description |
|---|---|
TransportChannel
|
A TransportChannel configured for RCS messaging. |
EmailChannel ¶
Create an Email transport channel.
AIChannel ¶
AIChannel(channel_id, provider, system_prompt=None, temperature=0.7, max_tokens=1024, max_context_events=50, tool_handler=None, tools=None, max_tool_rounds=200, tool_loop_timeout_seconds=300.0, tool_loop_warn_after=50, retry_policy=None, fallback_provider=None, skills=None, script_executor=None, sandbox=None, memory=None, tool_policy=None, thinking_budget=None, evict_threshold_tokens=5000, enable_planning=False)
Bases: AIStreamingMixin, AIGenerationMixin, AIToolsMixin, AIContextMixin, AIResilienceMixin, AIToolPolicyMixin, AISteeringMixin, AIEventsMixin, Channel
AI intelligence channel that generates responses using an AI provider.
tool_handler
property
writable
¶
The current tool handler (may be wrapped by orchestration).
on_event
async
¶
React to an event by generating an AI response.
Skips events from this channel to prevent self-loops. When the provider supports streaming or structured streaming: - With tools: uses the streaming tool loop that executes tool calls between generation rounds while yielding text deltas progressively. - Without tools: returns a plain streaming response. Otherwise falls back to the non-streaming generate path.
deliver
async
¶
Intelligence channels are not called via deliver by the router.
WebSocketChannel ¶
Bases: Channel
WebSocket transport channel with connection registry.
supports_streaming_delivery
property
¶
Whether any connected client supports streaming text delivery.
register_connection ¶
Register a WebSocket connection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
connection_id
|
str
|
Unique connection identifier. |
required |
send_fn
|
SendFn
|
Callback for delivering complete events. |
required |
stream_send_fn
|
StreamSendFn | None
|
Optional callback for delivering streaming messages.
When provided, this connection receives progressive text delivery
via the |
None
|
deliver_stream
async
¶
Deliver a streaming text response to connected clients.
Streaming-capable connections receive stream_start, stream_chunk,
and stream_end messages progressively. Non-streaming connections
receive the final complete event via the regular send_fn.
VoiceChannel ¶
VoiceChannel(channel_id, *, stt=None, tts=None, backend=None, pipeline=None, streaming=True, enable_barge_in=True, barge_in_threshold_ms=200, interruption=None, batch_mode=False, voice_map=None, max_audio_frames_per_second=None, tts_filter=None, bridge=None, recording=None)
Bases: VoiceSTTMixin, VoiceTTSMixin, VoiceHooksMixin, VoiceTurnMixin, VoicePipelineMixin, Channel
Real-time voice communication channel.
Supports three STT modes:
- VAD mode (default): VAD segments speech, streaming STT during speech
with batch fallback on SPEECH_END.
- Continuous mode: No VAD + streaming STT provider — all audio streamed,
provider handles endpointing.
- Batch mode (batch_mode=True): No VAD, audio accumulates post-pipeline.
Caller controls when to transcribe via :meth:flush_stt. Useful for
dictation, voicemail, and audio-file transcription with offline models.
When a VoiceBackend and AudioPipelineConfig are configured, the channel: - Registers for raw audio frames from the backend via on_audio_received - Routes frames through the AudioPipeline inbound chain: [Resampler] -> [Recorder] -> [AEC] -> [AGC] -> [Denoiser] -> VAD -> [Diarization] + [DTMF] - Fires hooks based on pipeline events (speech, silence, DTMF, recording, etc.) - Transcribes speech using the STT provider - Optionally evaluates turn completion via TurnDetector - Synthesizes AI responses using TTS and streams to the client
When no pipeline is configured, the channel operates without VAD — the backend must handle speech detection externally.
supports_streaming_delivery
property
¶
Whether this channel can accept streaming text delivery.
set_bridge_filter ¶
Set a synchronous filter for bridged audio frames.
The filter runs in the audio callback thread before each frame
is forwarded. It receives (source_session, frame) and
returns the frame (possibly modified) or None to drop it.
This is the synchronous equivalent of BEFORE_BRIDGE_AUDIO
— use it for fast operations like per-session muting or gain.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
fn
|
BridgeFrameFilter | None
|
Filter function, or |
required |
set_framework ¶
Set the framework reference for inbound routing.
Called automatically when the channel is registered with RoomKit.
on_trace ¶
Register a trace observer and bridge to the backend.
bind_session ¶
Bind a voice session to a room for message routing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
VoiceSession
|
The voice session to bind. |
required |
room_id
|
str
|
Target room ID. |
required |
binding
|
ChannelBinding
|
Channel binding descriptor. |
required |
backend
|
VoiceBackend | None
|
Override backend for the bridge. When bridging sessions from different transports (e.g. SIP + WebRTC), pass the session's own backend so the bridge sends audio through the correct transport. |
None
|
connect_session
async
¶
Accept a voice session via process_inbound.
Delegates to :meth:bind_session which handles pipeline
activation and framework events.
disconnect_session
async
¶
Clean up a voice session on remote disconnect.
update_binding ¶
Update cached bindings for all sessions in a room.
Called by the framework after mute/unmute/set_access so the
audio gate in _on_audio_received sees the new state.
add_media_tap ¶
Register a tap on processed inbound audio frames (for room recording).
Delegates to the pipeline's on_processed_frame callback list.
add_outbound_media_tap ¶
Register a tap on outbound TTS audio (for room recording).
The callback receives (session, pcm_data, sample_rate) for
every outbound chunk after pipeline processing.
update_voice_map ¶
Merge entries into the per-agent voice map.
Called by :meth:ConversationPipeline.install to auto-wire
voice IDs from :class:Agent instances.
send_dtmf ¶
Send a DTMF digit to the remote party via the voice backend.
The digit is sent as an RFC 4733 telephone-event (out-of-band).
Requires a backend with DTMF_SIGNALING capability (SIP, RTP).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
VoiceSession
|
The active voice session. |
required |
digit
|
str
|
DTMF digit ('0'-'9', '*', '#', 'A'-'D'). |
required |
duration_ms
|
int
|
Tone duration in milliseconds (default 160). |
160
|
Raises:
| Type | Description |
|---|---|
RuntimeError
|
If no backend is configured or session is ended. |
ValueError
|
If digit or duration_ms is invalid. |
interrupt
async
¶
Interrupt ongoing TTS playback for a session.
interrupt_all
async
¶
Interrupt all active TTS playback in a room.
Returns:
| Type | Description |
|---|---|
int
|
Number of sessions that were interrupted. |
wait_playback_done
async
¶
Wait until active TTS playback finishes for all sessions in room_id.
Returns immediately if no playback is in progress. Uses per-session
events that are set when send_audio() returns (before the echo
drain delay), so callers don't wait for the 2-second drain window.
RealtimeVoiceChannel ¶
RealtimeVoiceChannel(channel_id, *, provider, transport, system_prompt=None, voice=None, tools=None, temperature=None, input_sample_rate=16000, output_sample_rate=24000, transport_sample_rate=None, emit_transcription_events=True, tool_handler=None, mute_on_tool_call=False, tool_result_max_length=16384, pipeline=None, recording=None, skills=None, script_executor=None)
Bases: RealtimeToolsMixin, RealtimeTranscriptionMixin, RealtimeSpeechMixin, RealtimeAudioMixin, RealtimeResponseMixin, VoicePipelineMixin, Channel
Real-time voice channel using speech-to-speech AI providers.
Wraps APIs like OpenAI Realtime and Gemini Live as a first-class RoomKit channel. Audio flows directly between the user's browser and the provider; transcriptions are emitted into the Room so other channels (supervisor dashboards, logging) see the conversation.
Category is TRANSPORT so that:
- on_event() receives broadcasts (for text injection from supervisors)
- deliver() is called but returns empty (customer is on voice)
Example
from roomkit.voice.realtime.mock import MockRealtimeProvider, MockRealtimeTransport
provider = MockRealtimeProvider() transport = MockRealtimeTransport()
channel = RealtimeVoiceChannel( "realtime-1", provider=provider, transport=transport, system_prompt="You are a helpful agent.", ) kit.register_channel(channel)
Initialize realtime voice channel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
channel_id
|
str
|
Unique channel identifier. |
required |
provider
|
RealtimeVoiceProvider
|
The realtime voice provider (OpenAI, Gemini, etc.). |
required |
transport
|
VoiceBackend
|
The audio transport (WebSocket, etc.). |
required |
system_prompt
|
str | None
|
Default system prompt for the AI. |
None
|
voice
|
str | None
|
Default voice ID for audio output. |
None
|
tools
|
list[dict[str, Any] | Any] | None
|
Tool definitions as dicts, or Tool objects with
|
None
|
temperature
|
float | None
|
Default sampling temperature. |
None
|
input_sample_rate
|
int
|
Default input audio sample rate (Hz). |
16000
|
output_sample_rate
|
int
|
Default output audio sample rate (Hz). |
24000
|
transport_sample_rate
|
int | None
|
Sample rate of audio from the transport (Hz).
When set and different from provider rates, enables automatic
resampling. When |
None
|
emit_transcription_events
|
bool
|
If True, emit final transcriptions as RoomEvents so other channels see them. |
True
|
tool_handler
|
ToolHandler | None
|
Async callable to execute tool calls.
Signature: |
None
|
mute_on_tool_call
|
bool
|
If True, mute the transport microphone during
tool execution to prevent barge-in that causes providers
(e.g. Gemini) to silently drop the tool result. Defaults
to False — use |
False
|
tool_result_max_length
|
int
|
Maximum character length of tool results before truncation. Large results (e.g. SVG payloads) can overflow the provider's context window. Defaults to 16384. |
16384
|
pipeline
|
AudioPipelineConfig | None
|
Optional |
None
|
recording
|
Any | None
|
Optional |
None
|
skills
|
SkillRegistry | None
|
Optional |
None
|
script_executor
|
ScriptExecutor | None
|
Optional |
None
|
wait_idle
async
¶
Wait until all sessions in the room are idle (not speaking).
An idle session has finished its last response and all audio has been forwarded to the transport.
set_framework ¶
Set the framework reference for event routing.
Called automatically when the channel is registered with RoomKit.
on_trace ¶
Register a trace observer and bridge to the transport.
configure ¶
Update channel defaults for future sessions.
Active sessions are not affected — use reconfigure_session
for those.
inject_text
async
¶
Inject a text turn into the provider session.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
VoiceSession
|
The active voice session. |
required |
text
|
str
|
Text to inject. |
required |
role
|
str
|
Role for the text ('user' or 'system'). |
'user'
|
silent
|
bool
|
If True, add to conversation context without requesting a response. The agent sees the text on its next turn but does not react immediately. |
False
|
start_session
async
¶
Start a new realtime voice session.
Connects both the transport (client audio) and the provider (AI service), then fires a framework event.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
room_id
|
str
|
The room to join. |
required |
participant_id
|
str
|
The participant's ID. |
required |
connection
|
Any
|
Protocol-specific connection (e.g. WebSocket). |
required |
metadata
|
dict[str, Any] | None
|
Optional session metadata. May include overrides for system_prompt, voice, tools, temperature. |
None
|
Returns:
| Type | Description |
|---|---|
VoiceSession
|
The created VoiceSession. |
end_session
async
¶
End a realtime voice session.
Disconnects both provider and transport, fires framework event.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
VoiceSession
|
The session to end. |
required |
reconfigure_session
async
¶
reconfigure_session(session, *, system_prompt=None, voice=None, tools=None, temperature=None, provider_config=None)
Reconfigure an active session with new agent parameters.
Used during agent handoff to switch the AI personality, voice, and tools. Providers with session resumption (e.g. Gemini Live) preserve conversation history across the reconfiguration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
VoiceSession
|
The active session to reconfigure. |
required |
system_prompt
|
str | None
|
New system instructions for the AI. |
None
|
voice
|
str | None
|
New voice ID for audio output. |
None
|
tools
|
list[dict[str, Any]] | None
|
New tool/function definitions. |
None
|
temperature
|
float | None
|
New sampling temperature. |
None
|
provider_config
|
dict[str, Any] | None
|
Provider-specific configuration overrides. |
None
|
connect_session
async
¶
Accept a realtime voice session via process_inbound.
Delegates to :meth:start_session which handles provider/transport
connection, resampling, and framework events.
disconnect_session
async
¶
Clean up realtime sessions on remote disconnect.
update_binding ¶
Update cached bindings for all sessions in a room.
Called by the framework after mute/unmute/set_access so the
audio gate in _pipeline_on_audio_received (pipeline path)
or _forward_client_audio (direct path) sees the new state.
handle_inbound
async
¶
Not used directly — audio flows via start_session.
on_event
async
¶
React to events from other channels — TEXT INJECTION.
When a supervisor or other channel sends a message, extract the text and inject it into the provider session so the AI incorporates it. Skips events from this channel (self-loop prevention).
deliver
async
¶
No-op delivery — customer is on voice, can't see text.
WhatsAppChannel ¶
Create a WhatsApp transport channel.
MessengerChannel ¶
Create a Facebook Messenger transport channel.
TeamsChannel ¶
Create a Microsoft Teams transport channel.
TelegramChannel ¶
Create a Telegram Bot transport channel.
WhatsAppPersonalChannel ¶
Create a WhatsApp Personal transport channel (neonize).
TransportChannel ¶
TransportChannel(channel_id, channel_type, *, provider=None, capabilities=None, recipient_key='recipient_id', defaults=None)
Bases: Channel
Generic transport channel driven by configuration rather than subclassing.
All transport channels (SMS, Email, WhatsApp, Messenger, HTTP) share the
same inbound/deliver logic. The only differences are data: which
ChannelType, which ChannelCapabilities, which metadata key holds the
recipient address, and which extra kwargs to pass to the provider's
send() method.
Use the factory functions (SMSChannel, EmailChannel, …) in
roomkit.channels for convenient construction.
Initialise a transport channel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
channel_id
|
str
|
Unique identifier for this channel instance. |
required |
channel_type
|
ChannelType
|
The channel type (SMS, email, etc.). |
required |
provider
|
Any
|
Provider that handles external delivery (e.g. ElasticEmailProvider). |
None
|
capabilities
|
ChannelCapabilities | None
|
Media and feature capabilities for this channel. |
None
|
recipient_key
|
str
|
Binding metadata key that holds the recipient address. |
'recipient_id'
|
defaults
|
dict[str, Any] | None
|
Default kwargs passed to |
None
|
handle_inbound
async
¶
Convert an inbound message into a room event.
deliver
async
¶
Deliver an event to the external recipient via the provider.
The recipient address is read from binding.metadata[recipient_key].
Extra kwargs are built from defaults: fixed values are passed as-is,
None defaults are resolved from binding metadata at delivery time.
WebSocket Streaming¶
StreamStart ¶
Bases: BaseModel
Sent when a streaming response begins.
StreamChunk ¶
Bases: BaseModel
Sent for each text delta during streaming.
StreamEnd ¶
Bases: BaseModel
Sent when a streaming response completes.
StreamMessage
module-attribute
¶
StreamMessage = StreamStart | StreamChunk | StreamEnd | StreamError
StreamSendFn
module-attribute
¶
StreamSendFn = Callable[[str, StreamMessage], Coroutine[Any, Any, None]]