Built-in Channels¶
SMSChannel ¶
Create an SMS transport channel.
RCSChannel ¶
Create an RCS (Rich Communication Services) transport channel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
channel_id
|
str
|
Unique identifier for this channel. |
required |
provider
|
Any
|
RCS provider instance (e.g., TwilioRCSProvider). |
None
|
fallback
|
bool
|
If True (default), allow SMS fallback when RCS unavailable. |
True
|
Returns:
| Type | Description |
|---|---|
TransportChannel
|
A TransportChannel configured for RCS messaging. |
EmailChannel ¶
Create an Email transport channel.
AIChannel ¶
AIChannel(channel_id, provider, system_prompt=None, temperature=0.7, max_tokens=1024, max_context_events=50, tool_handler=None, max_tool_rounds=200, tool_loop_timeout_seconds=300.0, tool_loop_warn_after=50, retry_policy=None, fallback_provider=None, skills=None, script_executor=None, memory=None, tool_policy=None, thinking_budget=None)
Bases: Channel
AI intelligence channel that generates responses using an AI provider.
tool_handler
property
writable
¶
The current tool handler (may be wrapped by orchestration).
steer ¶
Enqueue a steering directive for the active tool loop.
Safe to call from any coroutine. Cancel directives also set the fast-path cancel event so the loop can exit without waiting for the next drain point.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
directive
|
SteeringDirective
|
The steering directive to enqueue. |
required |
loop_id
|
str | None
|
Optional loop ID to target. If |
None
|
on_event
async
¶
React to an event by generating an AI response.
Skips events from this channel to prevent self-loops. When the provider supports streaming or structured streaming: - With tools: uses the streaming tool loop that executes tool calls between generation rounds while yielding text deltas progressively. - Without tools: returns a plain streaming response. Otherwise falls back to the non-streaming generate path.
deliver
async
¶
Intelligence channels are not called via deliver by the router.
WebSocketChannel ¶
Bases: Channel
WebSocket transport channel with connection registry.
supports_streaming_delivery
property
¶
Whether any connected client supports streaming text delivery.
register_connection ¶
Register a WebSocket connection.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
connection_id
|
str
|
Unique connection identifier. |
required |
send_fn
|
SendFn
|
Callback for delivering complete events. |
required |
stream_send_fn
|
StreamSendFn | None
|
Optional callback for delivering streaming messages.
When provided, this connection receives progressive text delivery
via the |
None
|
deliver_stream
async
¶
Deliver a streaming text response to connected clients.
Streaming-capable connections receive stream_start, stream_chunk,
and stream_end messages progressively. Non-streaming connections
receive the final complete event via the regular send_fn.
VoiceChannel ¶
VoiceChannel(channel_id, *, stt=None, tts=None, backend=None, pipeline=None, streaming=True, enable_barge_in=True, barge_in_threshold_ms=200, interruption=None, batch_mode=False, voice_map=None, max_audio_frames_per_second=None, tts_filter=None)
Bases: VoiceSTTMixin, VoiceTTSMixin, VoiceHooksMixin, VoiceTurnMixin, Channel
Real-time voice communication channel.
Supports three STT modes:
- VAD mode (default): VAD segments speech, streaming STT during speech
with batch fallback on SPEECH_END.
- Continuous mode: No VAD + streaming STT provider — all audio streamed,
provider handles endpointing.
- Batch mode (batch_mode=True): No VAD, audio accumulates post-pipeline.
Caller controls when to transcribe via :meth:flush_stt. Useful for
dictation, voicemail, and audio-file transcription with offline models.
When a VoiceBackend and AudioPipelineConfig are configured, the channel: - Registers for raw audio frames from the backend via on_audio_received - Routes frames through the AudioPipeline inbound chain: [Resampler] -> [Recorder] -> [AEC] -> [AGC] -> [Denoiser] -> VAD -> [Diarization] + [DTMF] - Fires hooks based on pipeline events (speech, silence, DTMF, recording, etc.) - Transcribes speech using the STT provider - Optionally evaluates turn completion via TurnDetector - Synthesizes AI responses using TTS and streams to the client
When no pipeline is configured, the channel operates without VAD — the backend must handle speech detection externally.
supports_streaming_delivery
property
¶
Whether this channel can accept streaming text delivery.
set_framework ¶
Set the framework reference for inbound routing.
Called automatically when the channel is registered with RoomKit.
on_trace ¶
Register a trace observer and bridge to the backend.
bind_session ¶
Bind a voice session to a room for message routing.
connect_session
async
¶
Accept a voice session via process_inbound.
Delegates to :meth:bind_session which handles pipeline
activation and framework events.
disconnect_session
async
¶
Clean up a voice session on remote disconnect.
update_binding ¶
Update cached bindings for all sessions in a room.
Called by the framework after mute/unmute/set_access so the
audio gate in _on_audio_received sees the new state.
update_voice_map ¶
Merge entries into the per-agent voice map.
Called by :meth:ConversationPipeline.install to auto-wire
voice IDs from :class:Agent instances.
interrupt
async
¶
Interrupt ongoing TTS playback for a session.
interrupt_all
async
¶
Interrupt all active TTS playback in a room.
Returns:
| Type | Description |
|---|---|
int
|
Number of sessions that were interrupted. |
wait_playback_done
async
¶
Wait until active TTS playback finishes for all sessions in room_id.
Returns immediately if no playback is in progress. Uses per-session
events that are set when send_audio() returns (before the echo
drain delay), so callers don't wait for the 2-second drain window.
RealtimeVoiceChannel ¶
RealtimeVoiceChannel(channel_id, *, provider, transport, system_prompt=None, voice=None, tools=None, temperature=None, input_sample_rate=16000, output_sample_rate=24000, transport_sample_rate=None, emit_transcription_events=True, tool_handler=None, mute_on_tool_call=False, tool_result_max_length=16384)
Bases: Channel
Real-time voice channel using speech-to-speech AI providers.
Wraps APIs like OpenAI Realtime and Gemini Live as a first-class RoomKit channel. Audio flows directly between the user's browser and the provider; transcriptions are emitted into the Room so other channels (supervisor dashboards, logging) see the conversation.
Category is TRANSPORT so that:
- on_event() receives broadcasts (for text injection from supervisors)
- deliver() is called but returns empty (customer is on voice)
Example
from roomkit.voice.realtime.mock import MockRealtimeProvider, MockRealtimeTransport
provider = MockRealtimeProvider() transport = MockRealtimeTransport()
channel = RealtimeVoiceChannel( "realtime-1", provider=provider, transport=transport, system_prompt="You are a helpful agent.", ) kit.register_channel(channel)
Initialize realtime voice channel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
channel_id
|
str
|
Unique channel identifier. |
required |
provider
|
RealtimeVoiceProvider
|
The realtime voice provider (OpenAI, Gemini, etc.). |
required |
transport
|
VoiceBackend
|
The audio transport (WebSocket, etc.). |
required |
system_prompt
|
str | None
|
Default system prompt for the AI. |
None
|
voice
|
str | None
|
Default voice ID for audio output. |
None
|
tools
|
list[dict[str, Any]] | None
|
Default tool/function definitions. |
None
|
temperature
|
float | None
|
Default sampling temperature. |
None
|
input_sample_rate
|
int
|
Default input audio sample rate (Hz). |
16000
|
output_sample_rate
|
int
|
Default output audio sample rate (Hz). |
24000
|
transport_sample_rate
|
int | None
|
Sample rate of audio from the transport (Hz).
When set and different from provider rates, enables automatic
resampling. When |
None
|
emit_transcription_events
|
bool
|
If True, emit final transcriptions as RoomEvents so other channels see them. |
True
|
tool_handler
|
ToolHandler | None
|
Async callable to execute tool calls.
Signature: |
None
|
mute_on_tool_call
|
bool
|
If True, mute the transport microphone during
tool execution to prevent barge-in that causes providers
(e.g. Gemini) to silently drop the tool result. Defaults
to False — use |
False
|
tool_result_max_length
|
int
|
Maximum character length of tool results before truncation. Large results (e.g. SVG payloads) can overflow the provider's context window. Defaults to 16384. |
16384
|
set_framework ¶
Set the framework reference for event routing.
Called automatically when the channel is registered with RoomKit.
on_trace ¶
Register a trace observer and bridge to the transport.
inject_text
async
¶
Inject a text turn into the provider session.
Useful for nudging the provider when its server-side VAD stalls (e.g. Gemini ignoring valid speech after turn_complete).
start_session
async
¶
Start a new realtime voice session.
Connects both the transport (client audio) and the provider (AI service), then fires a framework event.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
room_id
|
str
|
The room to join. |
required |
participant_id
|
str
|
The participant's ID. |
required |
connection
|
Any
|
Protocol-specific connection (e.g. WebSocket). |
required |
metadata
|
dict[str, Any] | None
|
Optional session metadata. May include overrides for system_prompt, voice, tools, temperature. |
None
|
Returns:
| Type | Description |
|---|---|
VoiceSession
|
The created VoiceSession. |
end_session
async
¶
End a realtime voice session.
Disconnects both provider and transport, fires framework event.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
VoiceSession
|
The session to end. |
required |
reconfigure_session
async
¶
reconfigure_session(session, *, system_prompt=None, voice=None, tools=None, temperature=None, provider_config=None)
Reconfigure an active session with new agent parameters.
Used during agent handoff to switch the AI personality, voice, and tools. Providers with session resumption (e.g. Gemini Live) preserve conversation history across the reconfiguration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
session
|
VoiceSession
|
The active session to reconfigure. |
required |
system_prompt
|
str | None
|
New system instructions for the AI. |
None
|
voice
|
str | None
|
New voice ID for audio output. |
None
|
tools
|
list[dict[str, Any]] | None
|
New tool/function definitions. |
None
|
temperature
|
float | None
|
New sampling temperature. |
None
|
provider_config
|
dict[str, Any] | None
|
Provider-specific configuration overrides. |
None
|
connect_session
async
¶
Accept a realtime voice session via process_inbound.
Delegates to :meth:start_session which handles provider/transport
connection, resampling, and framework events.
disconnect_session
async
¶
Clean up realtime sessions on remote disconnect.
update_binding ¶
Update cached bindings for all sessions in a room.
Called by the framework after mute/unmute/set_access so the
audio gate in _forward_client_audio sees the new state.
handle_inbound
async
¶
Not used directly — audio flows via start_session.
on_event
async
¶
React to events from other channels — TEXT INJECTION.
When a supervisor or other channel sends a message, extract the text and inject it into the provider session so the AI incorporates it. Skips events from this channel (self-loop prevention).
deliver
async
¶
No-op delivery — customer is on voice, can't see text.
WhatsAppChannel ¶
Create a WhatsApp transport channel.
MessengerChannel ¶
Create a Facebook Messenger transport channel.
TeamsChannel ¶
Create a Microsoft Teams transport channel.
TelegramChannel ¶
Create a Telegram Bot transport channel.
WhatsAppPersonalChannel ¶
Create a WhatsApp Personal transport channel (neonize).
TransportChannel ¶
TransportChannel(channel_id, channel_type, *, provider=None, capabilities=None, recipient_key='recipient_id', defaults=None)
Bases: Channel
Generic transport channel driven by configuration rather than subclassing.
All transport channels (SMS, Email, WhatsApp, Messenger, HTTP) share the
same inbound/deliver logic. The only differences are data: which
ChannelType, which ChannelCapabilities, which metadata key holds the
recipient address, and which extra kwargs to pass to the provider's
send() method.
Use the factory functions (SMSChannel, EmailChannel, …) in
roomkit.channels for convenient construction.
Initialise a transport channel.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
channel_id
|
str
|
Unique identifier for this channel instance. |
required |
channel_type
|
ChannelType
|
The channel type (SMS, email, etc.). |
required |
provider
|
Any
|
Provider that handles external delivery (e.g. ElasticEmailProvider). |
None
|
capabilities
|
ChannelCapabilities | None
|
Media and feature capabilities for this channel. |
None
|
recipient_key
|
str
|
Binding metadata key that holds the recipient address. |
'recipient_id'
|
defaults
|
dict[str, Any] | None
|
Default kwargs passed to |
None
|
handle_inbound
async
¶
Convert an inbound message into a room event.
deliver
async
¶
Deliver an event to the external recipient via the provider.
The recipient address is read from binding.metadata[recipient_key].
Extra kwargs are built from defaults: fixed values are passed as-is,
None defaults are resolved from binding metadata at delivery time.
WebSocket Streaming¶
StreamChunk ¶
Bases: BaseModel
Sent for each text delta during streaming.
StreamEnd ¶
Bases: BaseModel
Sent when a streaming response completes.
StreamMessage
module-attribute
¶
StreamMessage = StreamStart | StreamChunk | StreamEnd | StreamError
StreamSendFn
module-attribute
¶
StreamSendFn = Callable[[str, StreamMessage], Coroutine[Any, Any, None]]
StreamStart ¶
Bases: BaseModel
Sent when a streaming response begins.