Built-in Channels¶

SMSChannel ¶

SMSChannel(channel_id, *, provider=None, from_number=None)

Create an SMS transport channel.

RCSChannel ¶

RCSChannel(channel_id, *, provider=None, fallback=True)

Create an RCS (Rich Communication Services) transport channel.

Parameters:

Name	Type	Description	Default
`channel_id`	`str`	Unique identifier for this channel.	required
`provider`	`Any`	RCS provider instance (e.g., TwilioRCSProvider).	`None`
`fallback`	`bool`	If True (default), allow SMS fallback when RCS unavailable.	`True`

Returns:

Type	Description
`TransportChannel`	A TransportChannel configured for RCS messaging.

EmailChannel ¶

EmailChannel(channel_id, *, provider=None, from_address=None)

Create an Email transport channel.

AIChannel ¶

AIChannel(channel_id, provider, system_prompt=None, temperature=0.7, max_tokens=1024, max_context_events=50, tool_handler=None, tools=None, max_tool_rounds=200, tool_loop_timeout_seconds=300.0, tool_loop_warn_after=50, retry_policy=None, fallback_provider=None, skills=None, script_executor=None, sandbox=None, memory=None, tool_policy=None, thinking_budget=None, evict_threshold_tokens=5000, enable_planning=False)

Bases: AIStreamingMixin, AIGenerationMixin, AIToolsMixin, AIContextMixin, AIResilienceMixin, AIToolPolicyMixin, AISteeringMixin, AIEventsMixin, Channel

AI intelligence channel that generates responses using an AI provider.

tool_handler `property` `writable` ¶

tool_handler

The current tool handler (may be wrapped by orchestration).

extra_tools `property` ¶

extra_tools

All extra tools (user-provided + orchestration-injected).

on_event `async` ¶

on_event(event, binding, context)

React to an event by generating an AI response.

Skips events from this channel to prevent self-loops. When the provider supports streaming or structured streaming: - With tools: uses the streaming tool loop that executes tool calls between generation rounds while yielding text deltas progressively. - Without tools: returns a plain streaming response. Otherwise falls back to the non-streaming generate path.

deliver `async` ¶

deliver(event, binding, context)

Intelligence channels are not called via deliver by the router.

close `async` ¶

close()

Close the channel, its provider, memory, and executors.

WebSocketChannel ¶

WebSocketChannel(channel_id)

Bases: Channel

WebSocket transport channel with connection registry.

supports_streaming_delivery `property` ¶

supports_streaming_delivery

Whether any connected client supports streaming text delivery.

register_connection ¶

register_connection(connection_id, send_fn, *, stream_send_fn=None)

Register a WebSocket connection.

Parameters:

Name	Type	Description	Default
`connection_id`	`str`	Unique connection identifier.	required
`send_fn`	`SendFn`	Callback for delivering complete events.	required
`stream_send_fn`	`StreamSendFn \| None`	Optional callback for delivering streaming messages. When provided, this connection receives progressive text delivery via the `stream_start`/`stream_chunk`/`stream_end` protocol.	`None`

unregister_connection ¶

unregister_connection(connection_id)

Unregister a WebSocket connection.

deliver_stream `async` ¶

deliver_stream(text_stream, event, binding, context)

Deliver a streaming text response to connected clients.

Streaming-capable connections receive stream_start, stream_chunk, and stream_end messages progressively. Non-streaming connections receive the final complete event via the regular send_fn.

VoiceChannel ¶

VoiceChannel(channel_id, *, stt=None, tts=None, backend=None, pipeline=None, streaming=True, enable_barge_in=True, barge_in_threshold_ms=200, interruption=None, batch_mode=False, voice_map=None, max_audio_frames_per_second=None, tts_filter=None, bridge=None, recording=None)

Bases: VoiceSTTMixin, VoiceTTSMixin, VoiceHooksMixin, VoiceTurnMixin, VoicePipelineMixin, Channel

Real-time voice communication channel.

Supports three STT modes: - VAD mode (default): VAD segments speech, streaming STT during speech with batch fallback on SPEECH_END. - Continuous mode: No VAD + streaming STT provider — all audio streamed, provider handles endpointing. - Batch mode (batch_mode=True): No VAD, audio accumulates post-pipeline. Caller controls when to transcribe via :meth:flush_stt. Useful for dictation, voicemail, and audio-file transcription with offline models.

When a VoiceBackend and AudioPipelineConfig are configured, the channel: - Registers for raw audio frames from the backend via on_audio_received - Routes frames through the AudioPipeline inbound chain: [Resampler] -> [Recorder] -> [AEC] -> [AGC] -> [Denoiser] -> VAD -> [Diarization] + [DTMF] - Fires hooks based on pipeline events (speech, silence, DTMF, recording, etc.) - Transcribes speech using the STT provider - Optionally evaluates turn completion via TurnDetector - Synthesizes AI responses using TTS and streams to the client

When no pipeline is configured, the channel operates without VAD — the backend must handle speech detection externally.

backend `property` ¶

backend

The voice backend (if configured).

supports_streaming_delivery `property` ¶

supports_streaming_delivery

Whether this channel can accept streaming text delivery.

set_bridge_filter ¶

set_bridge_filter(fn)

Set a synchronous filter for bridged audio frames.

The filter runs in the audio callback thread before each frame is forwarded. It receives (source_session, frame) and returns the frame (possibly modified) or None to drop it.

This is the synchronous equivalent of BEFORE_BRIDGE_AUDIO — use it for fast operations like per-session muting or gain.

Parameters:

Name	Type	Description	Default
`fn`	`BridgeFrameFilter \| None`	Filter function, or `None` to remove.	required

set_framework ¶

set_framework(framework)

Set the framework reference for inbound routing.

Called automatically when the channel is registered with RoomKit.

on_trace ¶

on_trace(callback, *, protocols=None)

Register a trace observer and bridge to the backend.

resolve_trace_room ¶

resolve_trace_room(session_id)

Resolve room_id from voice session bindings.

bind_session ¶

bind_session(session, room_id, binding, *, backend=None)

Bind a voice session to a room for message routing.

Parameters:

Name	Type	Description	Default
`session`	`VoiceSession`	The voice session to bind.	required
`room_id`	`str`	Target room ID.	required
`binding`	`ChannelBinding`	Channel binding descriptor.	required
`backend`	`VoiceBackend \| None`	Override backend for the bridge. When bridging sessions from different transports (e.g. SIP + WebRTC), pass the session's own backend so the bridge sends audio through the correct transport.	`None`

connect_session `async` ¶

connect_session(session, room_id, binding)

Accept a voice session via process_inbound.

Delegates to :meth:bind_session which handles pipeline activation and framework events.

disconnect_session `async` ¶

disconnect_session(session, room_id)

Clean up a voice session on remote disconnect.

update_binding ¶

update_binding(room_id, binding)

Update cached bindings for all sessions in a room.

Called by the framework after mute/unmute/set_access so the audio gate in _on_audio_received sees the new state.

add_media_tap ¶

add_media_tap(callback)

Register a tap on processed inbound audio frames (for room recording).

Delegates to the pipeline's on_processed_frame callback list.

add_outbound_media_tap ¶

add_outbound_media_tap(callback)

Register a tap on outbound TTS audio (for room recording).

The callback receives (session, pcm_data, sample_rate) for every outbound chunk after pipeline processing.

unbind_session ¶

unbind_session(session)

Remove session binding.

update_voice_map ¶

update_voice_map(entries)

Merge entries into the per-agent voice map.

Called by :meth:ConversationPipeline.install to auto-wire voice IDs from :class:Agent instances.

send_dtmf ¶

send_dtmf(session, digit, duration_ms=160)

Send a DTMF digit to the remote party via the voice backend.

The digit is sent as an RFC 4733 telephone-event (out-of-band). Requires a backend with DTMF_SIGNALING capability (SIP, RTP).

Parameters:

Name	Type	Description	Default
`session`	`VoiceSession`	The active voice session.	required
`digit`	`str`	DTMF digit ('0'-'9', '*', '#', 'A'-'D').	required
`duration_ms`	`int`	Tone duration in milliseconds (default 160).	`160`

Raises:

Type	Description
`RuntimeError`	If no backend is configured or session is ended.
`ValueError`	If digit or duration_ms is invalid.

interrupt `async` ¶

interrupt(session, *, reason='explicit')

Interrupt ongoing TTS playback for a session.

interrupt_all `async` ¶

interrupt_all(room_id, *, reason='task_delivery')

Interrupt all active TTS playback in a room.

Returns:

Type	Description
`int`	Number of sessions that were interrupted.

wait_playback_done `async` ¶

wait_playback_done(room_id, timeout=15.0)

Wait until active TTS playback finishes for all sessions in room_id.

Returns immediately if no playback is in progress. Uses per-session events that are set when send_audio() returns (before the echo drain delay), so callers don't wait for the 2-second drain window.

RealtimeVoiceChannel ¶

RealtimeVoiceChannel(channel_id, *, provider, transport, system_prompt=None, voice=None, tools=None, temperature=None, input_sample_rate=16000, output_sample_rate=24000, transport_sample_rate=None, emit_transcription_events=True, tool_handler=None, mute_on_tool_call=False, tool_result_max_length=16384, pipeline=None, recording=None, skills=None, script_executor=None)

Bases: RealtimeToolsMixin, RealtimeTranscriptionMixin, RealtimeSpeechMixin, RealtimeAudioMixin, RealtimeResponseMixin, VoicePipelineMixin, Channel

Real-time voice channel using speech-to-speech AI providers.

Wraps APIs like OpenAI Realtime and Gemini Live as a first-class RoomKit channel. Audio flows directly between the user's browser and the provider; transcriptions are emitted into the Room so other channels (supervisor dashboards, logging) see the conversation.

Category is TRANSPORT so that: - on_event() receives broadcasts (for text injection from supervisors) - deliver() is called but returns empty (customer is on voice)

Example

from roomkit.voice.realtime.mock import MockRealtimeProvider, MockRealtimeTransport

provider = MockRealtimeProvider() transport = MockRealtimeTransport()

channel = RealtimeVoiceChannel( "realtime-1", provider=provider, transport=transport, system_prompt="You are a helpful agent.", ) kit.register_channel(channel)

Initialize realtime voice channel.

Parameters:

Name	Type	Description	Default
`channel_id`	`str`	Unique channel identifier.	required
`provider`	`RealtimeVoiceProvider`	The realtime voice provider (OpenAI, Gemini, etc.).	required
`transport`	`VoiceBackend`	The audio transport (WebSocket, etc.).	required
`system_prompt`	`str \| None`	Default system prompt for the AI.	`None`
`voice`	`str \| None`	Default voice ID for audio output.	`None`
`tools`	`list[dict[str, Any] \| Any] \| None`	Tool definitions as dicts, or Tool objects with `.definition` and `.handler`. Tool objects have their handlers extracted and composed automatically.	`None`
`temperature`	`float \| None`	Default sampling temperature.	`None`
`input_sample_rate`	`int`	Default input audio sample rate (Hz).	`16000`
`output_sample_rate`	`int`	Default output audio sample rate (Hz).	`24000`
`transport_sample_rate`	`int \| None`	Sample rate of audio from the transport (Hz). When set and different from provider rates, enables automatic resampling. When `None` (default), no resampling is performed — backwards compatible with WebSocket transports.	`None`
`emit_transcription_events`	`bool`	If True, emit final transcriptions as RoomEvents so other channels see them.	`True`
`tool_handler`	`ToolHandler \| None`	Async callable to execute tool calls. Signature: `async (name, arguments) -> str`. If not set, falls back to handlers extracted from Tool objects, or `ON_TOOL_CALL` hooks.	`None`
`mute_on_tool_call`	`bool`	If True, mute the transport microphone during tool execution to prevent barge-in that causes providers (e.g. Gemini) to silently drop the tool result. Defaults to False — use `set_access()` for fine-grained control.	`False`
`tool_result_max_length`	`int`	Maximum character length of tool results before truncation. Large results (e.g. SVG payloads) can overflow the provider's context window. Defaults to 16384.	`16384`
`pipeline`	`AudioPipelineConfig \| None`	Optional `AudioPipelineConfig` for local audio processing (AEC, VAD, denoiser, etc.). When set, mic audio is processed through the pipeline before being forwarded to the provider, and pipeline VAD drives speech detection instead of server-side VAD.	`None`
`recording`	`Any \| None`	Optional `ChannelRecordingConfig` to enable room-level audio recording from this channel. Records both input (mic) and output (AI) audio tracks.	`None`
`skills`	`SkillRegistry \| None`	Optional `SkillRegistry` with discovered skills. When provided, skill infrastructure tools are injected and the skills preamble is appended to the system prompt.	`None`
`script_executor`	`ScriptExecutor \| None`	Optional `ScriptExecutor` for running skill scripts. Ignored when skills is `None`.	`None`

provider `property` ¶

provider

The underlying realtime voice provider.

session_rooms `property` ¶

session_rooms

Mapping of session_id to room_id.

tool_handler `property` `writable` ¶

tool_handler

The current tool handler for realtime tool calls.

get_room_sessions ¶

get_room_sessions(room_id)

Get all active sessions for a room.

wait_idle `async` ¶

wait_idle(room_id, timeout=15.0)

Wait until all sessions in the room are idle (not speaking).

An idle session has finished its last response and all audio has been forwarded to the transport.

set_framework ¶

set_framework(framework)

Set the framework reference for event routing.

Called automatically when the channel is registered with RoomKit.

on_trace ¶

on_trace(callback, *, protocols=None)

Register a trace observer and bridge to the transport.

resolve_trace_room ¶

resolve_trace_room(session_id)

Resolve room_id from realtime session mappings.

configure ¶

configure(*, system_prompt=None, voice=None, tools=None)

Update channel defaults for future sessions.

Active sessions are not affected — use reconfigure_session for those.

inject_text `async` ¶

inject_text(session, text, *, role='user', silent=False)

Inject a text turn into the provider session.

Parameters:

Name	Type	Description	Default
`session`	`VoiceSession`	The active voice session.	required
`text`	`str`	Text to inject.	required
`role`	`str`	Role for the text ('user' or 'system').	`'user'`
`silent`	`bool`	If True, add to conversation context without requesting a response. The agent sees the text on its next turn but does not react immediately.	`False`

start_session `async` ¶

start_session(room_id, participant_id, connection, *, metadata=None)

Start a new realtime voice session.

Connects both the transport (client audio) and the provider (AI service), then fires a framework event.

Parameters:

Name	Type	Description	Default
`room_id`	`str`	The room to join.	required
`participant_id`	`str`	The participant's ID.	required
`connection`	`Any`	Protocol-specific connection (e.g. WebSocket).	required
`metadata`	`dict[str, Any] \| None`	Optional session metadata. May include overrides for system_prompt, voice, tools, temperature.	`None`

Returns:

Type	Description
`VoiceSession`	The created VoiceSession.

end_session `async` ¶

end_session(session)

End a realtime voice session.

Disconnects both provider and transport, fires framework event.

Parameters:

Name	Type	Description	Default
`session`	`VoiceSession`	The session to end.	required

reconfigure_session `async` ¶

reconfigure_session(session, *, system_prompt=None, voice=None, tools=None, temperature=None, provider_config=None)

Reconfigure an active session with new agent parameters.

Used during agent handoff to switch the AI personality, voice, and tools. Providers with session resumption (e.g. Gemini Live) preserve conversation history across the reconfiguration.

Parameters:

Name	Type	Description	Default
`session`	`VoiceSession`	The active session to reconfigure.	required
`system_prompt`	`str \| None`	New system instructions for the AI.	`None`
`voice`	`str \| None`	New voice ID for audio output.	`None`
`tools`	`list[dict[str, Any]] \| None`	New tool/function definitions.	`None`
`temperature`	`float \| None`	New sampling temperature.	`None`
`provider_config`	`dict[str, Any] \| None`	Provider-specific configuration overrides.	`None`

connect_session `async` ¶

connect_session(session, room_id, binding)

Accept a realtime voice session via process_inbound.

Delegates to :meth:start_session which handles provider/transport connection, resampling, and framework events.

disconnect_session `async` ¶

disconnect_session(session, room_id)

Clean up realtime sessions on remote disconnect.

update_binding ¶

update_binding(room_id, binding)

Update cached bindings for all sessions in a room.

Called by the framework after mute/unmute/set_access so the audio gate in _pipeline_on_audio_received (pipeline path) or _forward_client_audio (direct path) sees the new state.

handle_inbound `async` ¶

handle_inbound(message, context)

Not used directly — audio flows via start_session.

on_event `async` ¶

on_event(event, binding, context)

React to events from other channels — TEXT INJECTION.

When a supervisor or other channel sends a message, extract the text and inject it into the provider session so the AI incorporates it. Skips events from this channel (self-loop prevention).

deliver `async` ¶

deliver(event, binding, context)

No-op delivery — customer is on voice, can't see text.

close `async` ¶

close()

End all sessions and close provider + transport.

WhatsAppChannel ¶

WhatsAppChannel(channel_id, *, provider=None)

Create a WhatsApp transport channel.

MessengerChannel ¶

MessengerChannel(channel_id, *, provider=None)

Create a Facebook Messenger transport channel.

TeamsChannel ¶

TeamsChannel(channel_id, *, provider=None)

Create a Microsoft Teams transport channel.

HTTPChannel ¶

HTTPChannel(channel_id, *, provider=None)

Create an HTTP webhook transport channel.

TelegramChannel ¶

TelegramChannel(channel_id, *, provider=None)

Create a Telegram Bot transport channel.

WhatsAppPersonalChannel ¶

WhatsAppPersonalChannel(channel_id, *, provider=None)

Create a WhatsApp Personal transport channel (neonize).

TransportChannel ¶

TransportChannel(channel_id, channel_type, *, provider=None, capabilities=None, recipient_key='recipient_id', defaults=None)

Bases: Channel

Generic transport channel driven by configuration rather than subclassing.

All transport channels (SMS, Email, WhatsApp, Messenger, HTTP) share the same inbound/deliver logic. The only differences are data: which ChannelType, which ChannelCapabilities, which metadata key holds the recipient address, and which extra kwargs to pass to the provider's send() method.

Use the factory functions (SMSChannel, EmailChannel, …) in roomkit.channels for convenient construction.

Initialise a transport channel.

Parameters:

Name	Type	Description	Default
`channel_id`	`str`	Unique identifier for this channel instance.	required
`channel_type`	`ChannelType`	The channel type (SMS, email, etc.).	required
`provider`	`Any`	Provider that handles external delivery (e.g. ElasticEmailProvider).	`None`
`capabilities`	`ChannelCapabilities \| None`	Media and feature capabilities for this channel.	`None`
`recipient_key`	`str`	Binding metadata key that holds the recipient address.	`'recipient_id'`
`defaults`	`dict[str, Any] \| None`	Default kwargs passed to `provider.send()`. If a default value is `None`, the actual value is read from the binding metadata at delivery time.	`None`

info `property` ¶

info

Return non-None default values as channel info metadata.

capabilities ¶

capabilities()

Return the channel's media and feature capabilities.

handle_inbound `async` ¶

handle_inbound(message, context)

Convert an inbound message into a room event.

deliver `async` ¶

deliver(event, binding, context)

Deliver an event to the external recipient via the provider.

The recipient address is read from binding.metadata[recipient_key]. Extra kwargs are built from defaults: fixed values are passed as-is, None defaults are resolved from binding metadata at delivery time.

WebSocket Streaming¶

StreamStart ¶

Bases: BaseModel

Sent when a streaming response begins.

StreamChunk ¶

Bases: BaseModel

Sent for each text delta during streaming.

StreamEnd ¶

Bases: BaseModel

Sent when a streaming response completes.

StreamMessage `module-attribute` ¶

StreamMessage = StreamStart | StreamChunk | StreamEnd | StreamError

StreamSendFn `module-attribute` ¶

StreamSendFn = Callable[[str, StreamMessage], Coroutine[Any, Any, None]]

Built-in Channels¶

SMSChannel ¶

RCSChannel ¶

EmailChannel ¶

AIChannel ¶

tool_handler property writable ¶

extra_tools property ¶

on_event async ¶

deliver async ¶

close async ¶

WebSocketChannel ¶

supports_streaming_delivery property ¶

register_connection ¶

unregister_connection ¶

deliver_stream async ¶

VoiceChannel ¶

backend property ¶

supports_streaming_delivery property ¶

set_bridge_filter ¶

set_framework ¶

on_trace ¶

resolve_trace_room ¶

bind_session ¶

connect_session async ¶

disconnect_session async ¶

update_binding ¶

add_media_tap ¶

add_outbound_media_tap ¶

unbind_session ¶

update_voice_map ¶

send_dtmf ¶

interrupt async ¶

interrupt_all async ¶

wait_playback_done async ¶

RealtimeVoiceChannel ¶

provider property ¶

session_rooms property ¶

tool_handler property writable ¶

get_room_sessions ¶

wait_idle async ¶

set_framework ¶

on_trace ¶

resolve_trace_room ¶

configure ¶

inject_text async ¶

start_session async ¶

end_session async ¶

reconfigure_session async ¶

connect_session async ¶

disconnect_session async ¶

update_binding ¶

handle_inbound async ¶

on_event async ¶

deliver async ¶

close async ¶

WhatsAppChannel ¶

MessengerChannel ¶

TeamsChannel ¶

HTTPChannel ¶

TelegramChannel ¶

WhatsAppPersonalChannel ¶

TransportChannel ¶

info property ¶

capabilities ¶

handle_inbound async ¶

deliver async ¶

WebSocket Streaming¶

StreamStart ¶

StreamChunk ¶

StreamEnd ¶

StreamMessage module-attribute ¶

StreamSendFn module-attribute ¶

tool_handler `property` `writable` ¶

extra_tools `property` ¶

on_event `async` ¶

deliver `async` ¶

close `async` ¶

supports_streaming_delivery `property` ¶

deliver_stream `async` ¶

backend `property` ¶

supports_streaming_delivery `property` ¶

connect_session `async` ¶

disconnect_session `async` ¶

interrupt `async` ¶

interrupt_all `async` ¶

wait_playback_done `async` ¶

provider `property` ¶

session_rooms `property` ¶

tool_handler `property` `writable` ¶

wait_idle `async` ¶

inject_text `async` ¶

start_session `async` ¶

end_session `async` ¶

reconfigure_session `async` ¶

connect_session `async` ¶

disconnect_session `async` ¶

handle_inbound `async` ¶

on_event `async` ¶

deliver `async` ¶

close `async` ¶

info `property` ¶

handle_inbound `async` ¶

deliver `async` ¶

StreamMessage `module-attribute` ¶

StreamSendFn `module-attribute` ¶