Skip to content

Built-in Channels

SMSChannel

SMSChannel(channel_id, *, provider=None, from_number=None)

Create an SMS transport channel.

RCSChannel

RCSChannel(channel_id, *, provider=None, fallback=True)

Create an RCS (Rich Communication Services) transport channel.

Parameters:

Name Type Description Default
channel_id str

Unique identifier for this channel.

required
provider Any

RCS provider instance (e.g., TwilioRCSProvider).

None
fallback bool

If True (default), allow SMS fallback when RCS unavailable.

True

Returns:

Type Description
TransportChannel

A TransportChannel configured for RCS messaging.

EmailChannel

EmailChannel(channel_id, *, provider=None, from_address=None)

Create an Email transport channel.

AIChannel

AIChannel(channel_id, provider, system_prompt=None, temperature=0.7, max_tokens=1024, max_context_events=50, tool_handler=None, max_tool_rounds=200, tool_loop_timeout_seconds=300.0, tool_loop_warn_after=50, retry_policy=None, fallback_provider=None, skills=None, script_executor=None, memory=None, tool_policy=None, thinking_budget=None)

Bases: Channel

AI intelligence channel that generates responses using an AI provider.

tool_handler property writable

tool_handler

The current tool handler (may be wrapped by orchestration).

extra_tools property

extra_tools

Extra tools injected by orchestration (e.g. handoff tool).

steer

steer(directive, *, loop_id=None)

Enqueue a steering directive for the active tool loop.

Safe to call from any coroutine. Cancel directives also set the fast-path cancel event so the loop can exit without waiting for the next drain point.

Parameters:

Name Type Description Default
directive SteeringDirective

The steering directive to enqueue.

required
loop_id str | None

Optional loop ID to target. If None, targets the most recently started active loop.

None

close async

close()

Close the channel, its provider, and the memory provider.

on_event async

on_event(event, binding, context)

React to an event by generating an AI response.

Skips events from this channel to prevent self-loops. When the provider supports streaming or structured streaming: - With tools: uses the streaming tool loop that executes tool calls between generation rounds while yielding text deltas progressively. - Without tools: returns a plain streaming response. Otherwise falls back to the non-streaming generate path.

deliver async

deliver(event, binding, context)

Intelligence channels are not called via deliver by the router.

WebSocketChannel

WebSocketChannel(channel_id)

Bases: Channel

WebSocket transport channel with connection registry.

supports_streaming_delivery property

supports_streaming_delivery

Whether any connected client supports streaming text delivery.

register_connection

register_connection(connection_id, send_fn, *, stream_send_fn=None)

Register a WebSocket connection.

Parameters:

Name Type Description Default
connection_id str

Unique connection identifier.

required
send_fn SendFn

Callback for delivering complete events.

required
stream_send_fn StreamSendFn | None

Optional callback for delivering streaming messages. When provided, this connection receives progressive text delivery via the stream_start/stream_chunk/stream_end protocol.

None

unregister_connection

unregister_connection(connection_id)

Unregister a WebSocket connection.

deliver_stream async

deliver_stream(text_stream, event, binding, context)

Deliver a streaming text response to connected clients.

Streaming-capable connections receive stream_start, stream_chunk, and stream_end messages progressively. Non-streaming connections receive the final complete event via the regular send_fn.

VoiceChannel

VoiceChannel(channel_id, *, stt=None, tts=None, backend=None, pipeline=None, streaming=True, enable_barge_in=True, barge_in_threshold_ms=200, interruption=None, batch_mode=False, voice_map=None, max_audio_frames_per_second=None, tts_filter=None)

Bases: VoiceSTTMixin, VoiceTTSMixin, VoiceHooksMixin, VoiceTurnMixin, Channel

Real-time voice communication channel.

Supports three STT modes: - VAD mode (default): VAD segments speech, streaming STT during speech with batch fallback on SPEECH_END. - Continuous mode: No VAD + streaming STT provider — all audio streamed, provider handles endpointing. - Batch mode (batch_mode=True): No VAD, audio accumulates post-pipeline. Caller controls when to transcribe via :meth:flush_stt. Useful for dictation, voicemail, and audio-file transcription with offline models.

When a VoiceBackend and AudioPipelineConfig are configured, the channel: - Registers for raw audio frames from the backend via on_audio_received - Routes frames through the AudioPipeline inbound chain: [Resampler] -> [Recorder] -> [AEC] -> [AGC] -> [Denoiser] -> VAD -> [Diarization] + [DTMF] - Fires hooks based on pipeline events (speech, silence, DTMF, recording, etc.) - Transcribes speech using the STT provider - Optionally evaluates turn completion via TurnDetector - Synthesizes AI responses using TTS and streams to the client

When no pipeline is configured, the channel operates without VAD — the backend must handle speech detection externally.

backend property

backend

The voice backend (if configured).

supports_streaming_delivery property

supports_streaming_delivery

Whether this channel can accept streaming text delivery.

set_framework

set_framework(framework)

Set the framework reference for inbound routing.

Called automatically when the channel is registered with RoomKit.

on_trace

on_trace(callback, *, protocols=None)

Register a trace observer and bridge to the backend.

resolve_trace_room

resolve_trace_room(session_id)

Resolve room_id from voice session bindings.

bind_session

bind_session(session, room_id, binding)

Bind a voice session to a room for message routing.

connect_session async

connect_session(session, room_id, binding)

Accept a voice session via process_inbound.

Delegates to :meth:bind_session which handles pipeline activation and framework events.

disconnect_session async

disconnect_session(session, room_id)

Clean up a voice session on remote disconnect.

update_binding

update_binding(room_id, binding)

Update cached bindings for all sessions in a room.

Called by the framework after mute/unmute/set_access so the audio gate in _on_audio_received sees the new state.

unbind_session

unbind_session(session)

Remove session binding.

update_voice_map

update_voice_map(entries)

Merge entries into the per-agent voice map.

Called by :meth:ConversationPipeline.install to auto-wire voice IDs from :class:Agent instances.

interrupt async

interrupt(session, *, reason='explicit')

Interrupt ongoing TTS playback for a session.

interrupt_all async

interrupt_all(room_id, *, reason='task_delivery')

Interrupt all active TTS playback in a room.

Returns:

Type Description
int

Number of sessions that were interrupted.

wait_playback_done async

wait_playback_done(room_id, timeout=15.0)

Wait until active TTS playback finishes for all sessions in room_id.

Returns immediately if no playback is in progress. Uses per-session events that are set when send_audio() returns (before the echo drain delay), so callers don't wait for the 2-second drain window.

RealtimeVoiceChannel

RealtimeVoiceChannel(channel_id, *, provider, transport, system_prompt=None, voice=None, tools=None, temperature=None, input_sample_rate=16000, output_sample_rate=24000, transport_sample_rate=None, emit_transcription_events=True, tool_handler=None, mute_on_tool_call=False, tool_result_max_length=16384)

Bases: Channel

Real-time voice channel using speech-to-speech AI providers.

Wraps APIs like OpenAI Realtime and Gemini Live as a first-class RoomKit channel. Audio flows directly between the user's browser and the provider; transcriptions are emitted into the Room so other channels (supervisor dashboards, logging) see the conversation.

Category is TRANSPORT so that: - on_event() receives broadcasts (for text injection from supervisors) - deliver() is called but returns empty (customer is on voice)

Example

from roomkit.voice.realtime.mock import MockRealtimeProvider, MockRealtimeTransport

provider = MockRealtimeProvider() transport = MockRealtimeTransport()

channel = RealtimeVoiceChannel( "realtime-1", provider=provider, transport=transport, system_prompt="You are a helpful agent.", ) kit.register_channel(channel)

Initialize realtime voice channel.

Parameters:

Name Type Description Default
channel_id str

Unique channel identifier.

required
provider RealtimeVoiceProvider

The realtime voice provider (OpenAI, Gemini, etc.).

required
transport VoiceBackend

The audio transport (WebSocket, etc.).

required
system_prompt str | None

Default system prompt for the AI.

None
voice str | None

Default voice ID for audio output.

None
tools list[dict[str, Any]] | None

Default tool/function definitions.

None
temperature float | None

Default sampling temperature.

None
input_sample_rate int

Default input audio sample rate (Hz).

16000
output_sample_rate int

Default output audio sample rate (Hz).

24000
transport_sample_rate int | None

Sample rate of audio from the transport (Hz). When set and different from provider rates, enables automatic resampling. When None (default), no resampling is performed — backwards compatible with WebSocket transports.

None
emit_transcription_events bool

If True, emit final transcriptions as RoomEvents so other channels see them.

True
tool_handler ToolHandler | None

Async callable to execute tool calls. Signature: async (session, name, arguments) -> result. Return a dict or JSON string. If not set, falls back to ON_REALTIME_TOOL_CALL hooks.

None
mute_on_tool_call bool

If True, mute the transport microphone during tool execution to prevent barge-in that causes providers (e.g. Gemini) to silently drop the tool result. Defaults to False — use set_access() for fine-grained control.

False
tool_result_max_length int

Maximum character length of tool results before truncation. Large results (e.g. SVG payloads) can overflow the provider's context window. Defaults to 16384.

16384

provider property

provider

The underlying realtime voice provider.

session_rooms property

session_rooms

Mapping of session_id to room_id.

tool_handler property writable

tool_handler

The current tool handler for realtime tool calls.

get_room_sessions

get_room_sessions(room_id)

Get all active sessions for a room.

set_framework

set_framework(framework)

Set the framework reference for event routing.

Called automatically when the channel is registered with RoomKit.

on_trace

on_trace(callback, *, protocols=None)

Register a trace observer and bridge to the transport.

resolve_trace_room

resolve_trace_room(session_id)

Resolve room_id from realtime session mappings.

inject_text async

inject_text(session, text, *, role='user')

Inject a text turn into the provider session.

Useful for nudging the provider when its server-side VAD stalls (e.g. Gemini ignoring valid speech after turn_complete).

start_session async

start_session(room_id, participant_id, connection, *, metadata=None)

Start a new realtime voice session.

Connects both the transport (client audio) and the provider (AI service), then fires a framework event.

Parameters:

Name Type Description Default
room_id str

The room to join.

required
participant_id str

The participant's ID.

required
connection Any

Protocol-specific connection (e.g. WebSocket).

required
metadata dict[str, Any] | None

Optional session metadata. May include overrides for system_prompt, voice, tools, temperature.

None

Returns:

Type Description
VoiceSession

The created VoiceSession.

end_session async

end_session(session)

End a realtime voice session.

Disconnects both provider and transport, fires framework event.

Parameters:

Name Type Description Default
session VoiceSession

The session to end.

required

reconfigure_session async

reconfigure_session(session, *, system_prompt=None, voice=None, tools=None, temperature=None, provider_config=None)

Reconfigure an active session with new agent parameters.

Used during agent handoff to switch the AI personality, voice, and tools. Providers with session resumption (e.g. Gemini Live) preserve conversation history across the reconfiguration.

Parameters:

Name Type Description Default
session VoiceSession

The active session to reconfigure.

required
system_prompt str | None

New system instructions for the AI.

None
voice str | None

New voice ID for audio output.

None
tools list[dict[str, Any]] | None

New tool/function definitions.

None
temperature float | None

New sampling temperature.

None
provider_config dict[str, Any] | None

Provider-specific configuration overrides.

None

connect_session async

connect_session(session, room_id, binding)

Accept a realtime voice session via process_inbound.

Delegates to :meth:start_session which handles provider/transport connection, resampling, and framework events.

disconnect_session async

disconnect_session(session, room_id)

Clean up realtime sessions on remote disconnect.

update_binding

update_binding(room_id, binding)

Update cached bindings for all sessions in a room.

Called by the framework after mute/unmute/set_access so the audio gate in _forward_client_audio sees the new state.

handle_inbound async

handle_inbound(message, context)

Not used directly — audio flows via start_session.

on_event async

on_event(event, binding, context)

React to events from other channels — TEXT INJECTION.

When a supervisor or other channel sends a message, extract the text and inject it into the provider session so the AI incorporates it. Skips events from this channel (self-loop prevention).

deliver async

deliver(event, binding, context)

No-op delivery — customer is on voice, can't see text.

close async

close()

End all sessions and close provider + transport.

WhatsAppChannel

WhatsAppChannel(channel_id, *, provider=None)

Create a WhatsApp transport channel.

MessengerChannel

MessengerChannel(channel_id, *, provider=None)

Create a Facebook Messenger transport channel.

TeamsChannel

TeamsChannel(channel_id, *, provider=None)

Create a Microsoft Teams transport channel.

HTTPChannel

HTTPChannel(channel_id, *, provider=None)

Create an HTTP webhook transport channel.

TelegramChannel

TelegramChannel(channel_id, *, provider=None)

Create a Telegram Bot transport channel.

WhatsAppPersonalChannel

WhatsAppPersonalChannel(channel_id, *, provider=None)

Create a WhatsApp Personal transport channel (neonize).

TransportChannel

TransportChannel(channel_id, channel_type, *, provider=None, capabilities=None, recipient_key='recipient_id', defaults=None)

Bases: Channel

Generic transport channel driven by configuration rather than subclassing.

All transport channels (SMS, Email, WhatsApp, Messenger, HTTP) share the same inbound/deliver logic. The only differences are data: which ChannelType, which ChannelCapabilities, which metadata key holds the recipient address, and which extra kwargs to pass to the provider's send() method.

Use the factory functions (SMSChannel, EmailChannel, …) in roomkit.channels for convenient construction.

Initialise a transport channel.

Parameters:

Name Type Description Default
channel_id str

Unique identifier for this channel instance.

required
channel_type ChannelType

The channel type (SMS, email, etc.).

required
provider Any

Provider that handles external delivery (e.g. ElasticEmailProvider).

None
capabilities ChannelCapabilities | None

Media and feature capabilities for this channel.

None
recipient_key str

Binding metadata key that holds the recipient address.

'recipient_id'
defaults dict[str, Any] | None

Default kwargs passed to provider.send(). If a default value is None, the actual value is read from the binding metadata at delivery time.

None

info property

info

Return non-None default values as channel info metadata.

capabilities

capabilities()

Return the channel's media and feature capabilities.

handle_inbound async

handle_inbound(message, context)

Convert an inbound message into a room event.

deliver async

deliver(event, binding, context)

Deliver an event to the external recipient via the provider.

The recipient address is read from binding.metadata[recipient_key]. Extra kwargs are built from defaults: fixed values are passed as-is, None defaults are resolved from binding metadata at delivery time.

WebSocket Streaming

StreamChunk

Bases: BaseModel

Sent for each text delta during streaming.

StreamEnd

Bases: BaseModel

Sent when a streaming response completes.

StreamMessage module-attribute

StreamMessage = StreamStart | StreamChunk | StreamEnd | StreamError

StreamSendFn module-attribute

StreamSendFn = Callable[[str, StreamMessage], Coroutine[Any, Any, None]]

StreamStart

Bases: BaseModel

Sent when a streaming response begins.