Built-in Channels¶

SMSChannel ¶

SMSChannel(channel_id, *, provider=None, from_number=None)

Create an SMS transport channel.

RCSChannel ¶

RCSChannel(channel_id, *, provider=None, fallback=True)

Create an RCS (Rich Communication Services) transport channel.

Parameters:

Name	Type	Description	Default
`channel_id`	`str`	Unique identifier for this channel.	required
`provider`	`Any`	RCS provider instance (e.g., TwilioRCSProvider).	`None`
`fallback`	`bool`	If True (default), allow SMS fallback when RCS unavailable.	`True`

Returns:

Type	Description
`TransportChannel`	A TransportChannel configured for RCS messaging.

EmailChannel ¶

EmailChannel(channel_id, *, provider=None, from_address=None)

Create an Email transport channel.

AIChannel ¶

AIChannel(channel_id, provider, system_prompt=None, temperature=0.7, max_tokens=1024, max_context_events=50, tool_handler=None, max_tool_rounds=200, tool_loop_timeout_seconds=300.0, tool_loop_warn_after=50, retry_policy=None, fallback_provider=None, skills=None, script_executor=None, memory=None, tool_policy=None, thinking_budget=None)

Bases: Channel

AI intelligence channel that generates responses using an AI provider.

tool_handler `property` `writable` ¶

tool_handler

The current tool handler (may be wrapped by orchestration).

extra_tools `property` ¶

extra_tools

Extra tools injected by orchestration (e.g. handoff tool).

steer ¶

steer(directive, *, loop_id=None)

Enqueue a steering directive for the active tool loop.

Safe to call from any coroutine. Cancel directives also set the fast-path cancel event so the loop can exit without waiting for the next drain point.

Parameters:

Name	Type	Description	Default
`directive`	`SteeringDirective`	The steering directive to enqueue.	required
`loop_id`	`str \| None`	Optional loop ID to target. If `None`, targets the most recently started active loop.	`None`

close `async` ¶

close()

Close the channel, its provider, and the memory provider.

on_event `async` ¶

on_event(event, binding, context)

React to an event by generating an AI response.

Skips events from this channel to prevent self-loops. When the provider supports streaming or structured streaming: - With tools: uses the streaming tool loop that executes tool calls between generation rounds while yielding text deltas progressively. - Without tools: returns a plain streaming response. Otherwise falls back to the non-streaming generate path.

deliver `async` ¶

deliver(event, binding, context)

Intelligence channels are not called via deliver by the router.

WebSocketChannel ¶

WebSocketChannel(channel_id)

Bases: Channel

WebSocket transport channel with connection registry.

supports_streaming_delivery `property` ¶

supports_streaming_delivery

Whether any connected client supports streaming text delivery.

register_connection ¶

register_connection(connection_id, send_fn, *, stream_send_fn=None)

Register a WebSocket connection.

Parameters:

Name	Type	Description	Default
`connection_id`	`str`	Unique connection identifier.	required
`send_fn`	`SendFn`	Callback for delivering complete events.	required
`stream_send_fn`	`StreamSendFn \| None`	Optional callback for delivering streaming messages. When provided, this connection receives progressive text delivery via the `stream_start`/`stream_chunk`/`stream_end` protocol.	`None`

unregister_connection ¶

unregister_connection(connection_id)

Unregister a WebSocket connection.

deliver_stream `async` ¶

deliver_stream(text_stream, event, binding, context)

Deliver a streaming text response to connected clients.

Streaming-capable connections receive stream_start, stream_chunk, and stream_end messages progressively. Non-streaming connections receive the final complete event via the regular send_fn.

VoiceChannel ¶

VoiceChannel(channel_id, *, stt=None, tts=None, backend=None, pipeline=None, streaming=True, enable_barge_in=True, barge_in_threshold_ms=200, interruption=None, batch_mode=False, voice_map=None, max_audio_frames_per_second=None, tts_filter=None)

Bases: VoiceSTTMixin, VoiceTTSMixin, VoiceHooksMixin, VoiceTurnMixin, Channel

Real-time voice communication channel.

Supports three STT modes: - VAD mode (default): VAD segments speech, streaming STT during speech with batch fallback on SPEECH_END. - Continuous mode: No VAD + streaming STT provider — all audio streamed, provider handles endpointing. - Batch mode (batch_mode=True): No VAD, audio accumulates post-pipeline. Caller controls when to transcribe via :meth:flush_stt. Useful for dictation, voicemail, and audio-file transcription with offline models.

When a VoiceBackend and AudioPipelineConfig are configured, the channel: - Registers for raw audio frames from the backend via on_audio_received - Routes frames through the AudioPipeline inbound chain: [Resampler] -> [Recorder] -> [AEC] -> [AGC] -> [Denoiser] -> VAD -> [Diarization] + [DTMF] - Fires hooks based on pipeline events (speech, silence, DTMF, recording, etc.) - Transcribes speech using the STT provider - Optionally evaluates turn completion via TurnDetector - Synthesizes AI responses using TTS and streams to the client

When no pipeline is configured, the channel operates without VAD — the backend must handle speech detection externally.

backend `property` ¶

backend

The voice backend (if configured).

supports_streaming_delivery `property` ¶

supports_streaming_delivery

Whether this channel can accept streaming text delivery.

set_framework ¶

set_framework(framework)

Set the framework reference for inbound routing.

Called automatically when the channel is registered with RoomKit.

on_trace ¶

on_trace(callback, *, protocols=None)

Register a trace observer and bridge to the backend.

resolve_trace_room ¶

resolve_trace_room(session_id)

Resolve room_id from voice session bindings.

bind_session ¶

bind_session(session, room_id, binding)

Bind a voice session to a room for message routing.

connect_session `async` ¶

connect_session(session, room_id, binding)

Accept a voice session via process_inbound.

Delegates to :meth:bind_session which handles pipeline activation and framework events.

disconnect_session `async` ¶

disconnect_session(session, room_id)

Clean up a voice session on remote disconnect.

update_binding ¶

update_binding(room_id, binding)

Update cached bindings for all sessions in a room.

Called by the framework after mute/unmute/set_access so the audio gate in _on_audio_received sees the new state.

unbind_session ¶

unbind_session(session)

Remove session binding.

update_voice_map ¶

update_voice_map(entries)

Merge entries into the per-agent voice map.

Called by :meth:ConversationPipeline.install to auto-wire voice IDs from :class:Agent instances.

interrupt `async` ¶

interrupt(session, *, reason='explicit')

Interrupt ongoing TTS playback for a session.

interrupt_all `async` ¶

interrupt_all(room_id, *, reason='task_delivery')

Interrupt all active TTS playback in a room.

Returns:

Type	Description
`int`	Number of sessions that were interrupted.

wait_playback_done `async` ¶

wait_playback_done(room_id, timeout=15.0)

Wait until active TTS playback finishes for all sessions in room_id.

Returns immediately if no playback is in progress. Uses per-session events that are set when send_audio() returns (before the echo drain delay), so callers don't wait for the 2-second drain window.

RealtimeVoiceChannel ¶

RealtimeVoiceChannel(channel_id, *, provider, transport, system_prompt=None, voice=None, tools=None, temperature=None, input_sample_rate=16000, output_sample_rate=24000, transport_sample_rate=None, emit_transcription_events=True, tool_handler=None, mute_on_tool_call=False, tool_result_max_length=16384)

Bases: Channel

Real-time voice channel using speech-to-speech AI providers.

Wraps APIs like OpenAI Realtime and Gemini Live as a first-class RoomKit channel. Audio flows directly between the user's browser and the provider; transcriptions are emitted into the Room so other channels (supervisor dashboards, logging) see the conversation.

Category is TRANSPORT so that: - on_event() receives broadcasts (for text injection from supervisors) - deliver() is called but returns empty (customer is on voice)

Example

from roomkit.voice.realtime.mock import MockRealtimeProvider, MockRealtimeTransport

provider = MockRealtimeProvider() transport = MockRealtimeTransport()

channel = RealtimeVoiceChannel( "realtime-1", provider=provider, transport=transport, system_prompt="You are a helpful agent.", ) kit.register_channel(channel)

Initialize realtime voice channel.

Parameters:

Name	Type	Description	Default
`channel_id`	`str`	Unique channel identifier.	required
`provider`	`RealtimeVoiceProvider`	The realtime voice provider (OpenAI, Gemini, etc.).	required
`transport`	`VoiceBackend`	The audio transport (WebSocket, etc.).	required
`system_prompt`	`str \| None`	Default system prompt for the AI.	`None`
`voice`	`str \| None`	Default voice ID for audio output.	`None`
`tools`	`list[dict[str, Any]] \| None`	Default tool/function definitions.	`None`
`temperature`	`float \| None`	Default sampling temperature.	`None`
`input_sample_rate`	`int`	Default input audio sample rate (Hz).	`16000`
`output_sample_rate`	`int`	Default output audio sample rate (Hz).	`24000`
`transport_sample_rate`	`int \| None`	Sample rate of audio from the transport (Hz). When set and different from provider rates, enables automatic resampling. When `None` (default), no resampling is performed — backwards compatible with WebSocket transports.	`None`
`emit_transcription_events`	`bool`	If True, emit final transcriptions as RoomEvents so other channels see them.	`True`
`tool_handler`	`ToolHandler \| None`	Async callable to execute tool calls. Signature: `async (session, name, arguments) -> result`. Return a dict or JSON string. If not set, falls back to `ON_REALTIME_TOOL_CALL` hooks.	`None`
`mute_on_tool_call`	`bool`	If True, mute the transport microphone during tool execution to prevent barge-in that causes providers (e.g. Gemini) to silently drop the tool result. Defaults to False — use `set_access()` for fine-grained control.	`False`
`tool_result_max_length`	`int`	Maximum character length of tool results before truncation. Large results (e.g. SVG payloads) can overflow the provider's context window. Defaults to 16384.	`16384`

provider `property` ¶

provider

The underlying realtime voice provider.

session_rooms `property` ¶

session_rooms

Mapping of session_id to room_id.

tool_handler `property` `writable` ¶

tool_handler

The current tool handler for realtime tool calls.

get_room_sessions ¶

get_room_sessions(room_id)

Get all active sessions for a room.

set_framework ¶

set_framework(framework)

Set the framework reference for event routing.

Called automatically when the channel is registered with RoomKit.

on_trace ¶

on_trace(callback, *, protocols=None)

Register a trace observer and bridge to the transport.

resolve_trace_room ¶

resolve_trace_room(session_id)

Resolve room_id from realtime session mappings.

inject_text `async` ¶

inject_text(session, text, *, role='user')

Inject a text turn into the provider session.

Useful for nudging the provider when its server-side VAD stalls (e.g. Gemini ignoring valid speech after turn_complete).

start_session `async` ¶

start_session(room_id, participant_id, connection, *, metadata=None)

Start a new realtime voice session.

Connects both the transport (client audio) and the provider (AI service), then fires a framework event.

Parameters:

Name	Type	Description	Default
`room_id`	`str`	The room to join.	required
`participant_id`	`str`	The participant's ID.	required
`connection`	`Any`	Protocol-specific connection (e.g. WebSocket).	required
`metadata`	`dict[str, Any] \| None`	Optional session metadata. May include overrides for system_prompt, voice, tools, temperature.	`None`

Returns:

Type	Description
`VoiceSession`	The created VoiceSession.

end_session `async` ¶

end_session(session)

End a realtime voice session.

Disconnects both provider and transport, fires framework event.

Parameters:

Name	Type	Description	Default
`session`	`VoiceSession`	The session to end.	required

reconfigure_session `async` ¶

reconfigure_session(session, *, system_prompt=None, voice=None, tools=None, temperature=None, provider_config=None)

Reconfigure an active session with new agent parameters.

Used during agent handoff to switch the AI personality, voice, and tools. Providers with session resumption (e.g. Gemini Live) preserve conversation history across the reconfiguration.

Parameters:

Name	Type	Description	Default
`session`	`VoiceSession`	The active session to reconfigure.	required
`system_prompt`	`str \| None`	New system instructions for the AI.	`None`
`voice`	`str \| None`	New voice ID for audio output.	`None`
`tools`	`list[dict[str, Any]] \| None`	New tool/function definitions.	`None`
`temperature`	`float \| None`	New sampling temperature.	`None`
`provider_config`	`dict[str, Any] \| None`	Provider-specific configuration overrides.	`None`

connect_session `async` ¶

connect_session(session, room_id, binding)

Accept a realtime voice session via process_inbound.

Delegates to :meth:start_session which handles provider/transport connection, resampling, and framework events.

disconnect_session `async` ¶

disconnect_session(session, room_id)

Clean up realtime sessions on remote disconnect.

update_binding ¶

update_binding(room_id, binding)

Update cached bindings for all sessions in a room.

Called by the framework after mute/unmute/set_access so the audio gate in _forward_client_audio sees the new state.

handle_inbound `async` ¶

handle_inbound(message, context)

Not used directly — audio flows via start_session.

on_event `async` ¶

on_event(event, binding, context)

React to events from other channels — TEXT INJECTION.

When a supervisor or other channel sends a message, extract the text and inject it into the provider session so the AI incorporates it. Skips events from this channel (self-loop prevention).

deliver `async` ¶

deliver(event, binding, context)

No-op delivery — customer is on voice, can't see text.

close `async` ¶

close()

End all sessions and close provider + transport.

WhatsAppChannel ¶

WhatsAppChannel(channel_id, *, provider=None)

Create a WhatsApp transport channel.

MessengerChannel ¶

MessengerChannel(channel_id, *, provider=None)

Create a Facebook Messenger transport channel.

TeamsChannel ¶

TeamsChannel(channel_id, *, provider=None)

Create a Microsoft Teams transport channel.

HTTPChannel ¶

HTTPChannel(channel_id, *, provider=None)

Create an HTTP webhook transport channel.

TelegramChannel ¶

TelegramChannel(channel_id, *, provider=None)

Create a Telegram Bot transport channel.

WhatsAppPersonalChannel ¶

WhatsAppPersonalChannel(channel_id, *, provider=None)

Create a WhatsApp Personal transport channel (neonize).

TransportChannel ¶

TransportChannel(channel_id, channel_type, *, provider=None, capabilities=None, recipient_key='recipient_id', defaults=None)

Bases: Channel

Generic transport channel driven by configuration rather than subclassing.

All transport channels (SMS, Email, WhatsApp, Messenger, HTTP) share the same inbound/deliver logic. The only differences are data: which ChannelType, which ChannelCapabilities, which metadata key holds the recipient address, and which extra kwargs to pass to the provider's send() method.

Use the factory functions (SMSChannel, EmailChannel, …) in roomkit.channels for convenient construction.

Initialise a transport channel.

Parameters:

Name	Type	Description	Default
`channel_id`	`str`	Unique identifier for this channel instance.	required
`channel_type`	`ChannelType`	The channel type (SMS, email, etc.).	required
`provider`	`Any`	Provider that handles external delivery (e.g. ElasticEmailProvider).	`None`
`capabilities`	`ChannelCapabilities \| None`	Media and feature capabilities for this channel.	`None`
`recipient_key`	`str`	Binding metadata key that holds the recipient address.	`'recipient_id'`
`defaults`	`dict[str, Any] \| None`	Default kwargs passed to `provider.send()`. If a default value is `None`, the actual value is read from the binding metadata at delivery time.	`None`

info `property` ¶

info

Return non-None default values as channel info metadata.

capabilities ¶

capabilities()

Return the channel's media and feature capabilities.

handle_inbound `async` ¶

handle_inbound(message, context)

Convert an inbound message into a room event.

deliver `async` ¶

deliver(event, binding, context)

Deliver an event to the external recipient via the provider.

The recipient address is read from binding.metadata[recipient_key]. Extra kwargs are built from defaults: fixed values are passed as-is, None defaults are resolved from binding metadata at delivery time.

WebSocket Streaming¶

StreamChunk ¶

Bases: BaseModel

Sent for each text delta during streaming.

StreamEnd ¶

Bases: BaseModel

Sent when a streaming response completes.

StreamMessage `module-attribute` ¶

StreamMessage = StreamStart | StreamChunk | StreamEnd | StreamError

StreamSendFn `module-attribute` ¶

StreamSendFn = Callable[[str, StreamMessage], Coroutine[Any, Any, None]]

StreamStart ¶

Bases: BaseModel

Sent when a streaming response begins.

Built-in Channels¶

SMSChannel ¶

RCSChannel ¶

EmailChannel ¶

AIChannel ¶

tool_handler property writable ¶

extra_tools property ¶

steer ¶

close async ¶

on_event async ¶

deliver async ¶

WebSocketChannel ¶

supports_streaming_delivery property ¶

register_connection ¶

unregister_connection ¶

deliver_stream async ¶

VoiceChannel ¶

backend property ¶

supports_streaming_delivery property ¶

set_framework ¶

on_trace ¶

resolve_trace_room ¶

bind_session ¶

connect_session async ¶

disconnect_session async ¶

update_binding ¶

unbind_session ¶

update_voice_map ¶

interrupt async ¶

interrupt_all async ¶

wait_playback_done async ¶

RealtimeVoiceChannel ¶

provider property ¶

session_rooms property ¶

tool_handler property writable ¶

get_room_sessions ¶

set_framework ¶

on_trace ¶

resolve_trace_room ¶

inject_text async ¶

start_session async ¶

end_session async ¶

reconfigure_session async ¶

connect_session async ¶

disconnect_session async ¶

update_binding ¶

handle_inbound async ¶

on_event async ¶

deliver async ¶

close async ¶

WhatsAppChannel ¶

MessengerChannel ¶

TeamsChannel ¶

HTTPChannel ¶

TelegramChannel ¶

WhatsAppPersonalChannel ¶

TransportChannel ¶

info property ¶

capabilities ¶

handle_inbound async ¶

deliver async ¶

WebSocket Streaming¶

StreamChunk ¶

StreamEnd ¶

StreamMessage module-attribute ¶

StreamSendFn module-attribute ¶

StreamStart ¶

tool_handler `property` `writable` ¶

extra_tools `property` ¶

close `async` ¶

on_event `async` ¶

deliver `async` ¶

supports_streaming_delivery `property` ¶

deliver_stream `async` ¶

backend `property` ¶

supports_streaming_delivery `property` ¶

connect_session `async` ¶

disconnect_session `async` ¶

interrupt `async` ¶

interrupt_all `async` ¶

wait_playback_done `async` ¶

provider `property` ¶

session_rooms `property` ¶

tool_handler `property` `writable` ¶

inject_text `async` ¶

start_session `async` ¶

end_session `async` ¶

reconfigure_session `async` ¶

connect_session `async` ¶

disconnect_session `async` ¶

handle_inbound `async` ¶

on_event `async` ¶

deliver `async` ¶

close `async` ¶

info `property` ¶

handle_inbound `async` ¶

deliver `async` ¶

StreamMessage `module-attribute` ¶

StreamSendFn `module-attribute` ¶