Alberto Gonzalez at WebRTC.ventures recently published an excellent comparison of four production voice AI frameworks: Amazon Bedrock Agents, Google Vertex AI, LiveKit Agents, and Pipecat Flows. The article frames the choice along two axes: cloud-native governance (Bedrock, Vertex) versus voice-native RTC (LiveKit, Pipecat), with hybrid architectures in between.
It's a well-drawn map. But it's missing a territory.
All four frameworks in that comparison assume the conversation is a voice call. The agent talks, the user talks, the call ends. What happens when that same conversation needs to continue over SMS? When a telehealth follow-up arrives by email? When a financial advisor sends a document via WhatsApp while still on the call? None of the four frameworks address this — because they weren't designed to.
RoomKit was.
The Fifth Architecture: Rooms and Channels
The WebRTC.ventures article identifies four architectural approaches:
- Bedrock/AgentCore — orchestration layers on top of foundation models, with IAM-aligned governance
- Vertex AI/ADK — ML lifecycle management with structured agent workflows and model selection
- LiveKit Agents — an agent joins a WebRTC room as a participant, with a built-in media server
- Pipecat Flows — composable frame-based pipelines with structured flow control
RoomKit takes a fundamentally different approach. A conversation is a room. Every communication medium — SMS, Email, WhatsApp, Voice, Telegram, Teams, AI — is a channel in that room. Messages flow through channels into rooms and are broadcast to all attached channels, with automatic content transcoding between formats.
Voice isn't the product. Voice is one channel in a multi-channel conversation.
This isn't just a philosophical distinction. It changes what's easy, what's possible, and what you get for free:
- A voice call that drops resumes seamlessly over SMS — same room, same context, same participant identity
- An AI agent's response broadcasts to the voice channel and sends a text summary to the patient's WhatsApp — in one event
- A compliance hook that runs on every message applies uniformly to voice transcriptions, SMS, and email — one policy, all channels
- Every event across every channel is sequentially indexed in one store — one audit trail per conversation
Production Comparison: Adding a Fifth Column
The original article evaluates frameworks on production concerns that matter in regulated environments. Here's where RoomKit fits in that matrix:
| Concern | Bedrock / Vertex | LiveKit | Pipecat | RoomKit |
|---|---|---|---|---|
| Architecture | Cloud orchestration layer | WebRTC room + agent participant | Frame-based pipeline | Multi-channel room + channel bindings |
| Governance | IAM, CloudTrail, model guardrails | Self-hosted media server | Custom middleware | 34+ hook triggers (sync/async), per-channel permissions, role-based access |
| Media transport | Requires separate stack | Built-in WebRTC + SIP | Daily.co (WebRTC) | FastRTC (WebRTC), SIP, RTP, WebSocket, local mic |
| Compliance / Audit | Cloud-native logging | Per-session metrics | Custom logging | Unified event store (PostgreSQL), all channels in one audit trail, WAV recording |
| Voice pipeline | N/A (not voice-native) | Session-managed (VAD, STT, TTS) | Linear frame chain (40+ services) | 12-stage audio pipeline: AEC, AGC, denoiser, VAD, DTMF, diarization, turn detection |
| AI providers | AWS models + Bedrock marketplace | OpenAI, Deepgram, Cartesia, etc. | 40+ integrations | Anthropic, OpenAI, Gemini, Mistral, Azure + 6 realtime (OpenAI, Gemini Live, ElevenLabs, Grok) |
| Non-voice channels | — | — | — | SMS, Email, WhatsApp, Telegram, Teams, Messenger, RCS, WebSocket, HTTP |
| Self-hosting | Cloud-only | Open-source media server | Self-hostable | Fully self-hostable, no external dependencies for core |
| Install | AWS SDK + console setup | pip install livekit-agents |
pip install pipecat-ai |
pip install roomkit |
Where This Matters: Real Production Scenarios
The WebRTC.ventures article correctly observes that voice AI agents are now deployed in "regulated and mission-critical environments such as telecom platforms, telehealth systems, emergency response workflows, and financial infrastructure." In those environments, conversations rarely stay in one channel.
Telehealth
A patient calls in. The voice agent conducts an intake, transcribes the conversation, and routes to a provider. After the call, a summary goes to the patient's WhatsApp. Appointment reminders arrive via SMS. Lab results are emailed. All of this is one conversation in one room — with one compliance audit trail, one identity, and one set of permission policies.
Financial Services
An advisor speaks with a client over voice while the AI agent simultaneously sends a portfolio document via email. The client asks a follow-up question over WhatsApp the next day. The agent has full context because the room persists. Every interaction — voice transcript, WhatsApp message, email — is indexed in PostgreSQL with sequential event ordering.
Contact Centers
A customer starts on the IVR (SIP/RTP backend). The voice agent handles tier-1 support. If the call drops or the customer prefers text, the conversation continues over SMS or web chat (WebSocket) — same room, same AI agent, no context loss. Supervisors observe via hooks without joining the room.
In each of these scenarios, Bedrock or Vertex can power the AI reasoning. LiveKit or Pipecat can handle the voice transport. But none of them orchestrate the conversation across channels. That's the gap RoomKit fills.
Complementary, Not Competing
RoomKit is not a replacement for any of the four frameworks in the WebRTC.ventures comparison. It operates at a different layer:
- Bedrock / Vertex are AI orchestration platforms. RoomKit already integrates with their models via its AI provider system — Anthropic (Claude on Bedrock), OpenAI, Gemini (on Vertex), Mistral. Use them as the intelligence layer inside RoomKit's AIChannel.
- LiveKit is a media server. RoomKit's FastRTC backend handles WebRTC transport, and its SIP backend handles telephony. If you're already invested in LiveKit's infrastructure, RoomKit's voice channel can sit on top of it.
- Pipecat is a voice pipeline framework. RoomKit's AudioPipeline follows a similar staged approach (12 processing stages from AEC to turn detection) but embeds it inside a channel that coexists with SMS, Email, and WhatsApp in the same room.
The hybrid approach the WebRTC.ventures article recommends — "cloud enterprise guardrails and managed models alongside a dedicated media-first real-time stack" — is exactly how RoomKit is designed to be deployed. Use Bedrock for model governance. Use LiveKit or RTP for media transport. Use RoomKit to tie the conversation together across every channel the customer uses.
What It Looks Like in Code
Here's a production-style multi-channel voice agent: voice + SMS fallback + AI intelligence + compliance hooks.
from roomkit import (
RoomKit, VoiceChannel, AIChannel, SMSChannel,
ChannelCategory, HookTrigger, HookResult,
)
from roomkit.providers.anthropic import AnthropicProvider, AnthropicConfig
from roomkit.providers.twilio import TwilioSMSProvider, TwilioConfig
from roomkit.voice.backends.sip import SIPVoiceBackend, SIPConfig
from roomkit.voice.pipeline import AudioPipelineConfig
from roomkit.voice.pipeline.vad.silero import SileroVADProvider
from roomkit.voice.stt.deepgram import DeepgramSTTProvider, DeepgramSTTConfig
from roomkit.voice.tts.elevenlabs import ElevenLabsTTSProvider, ElevenLabsTTSConfig
from roomkit.store.postgres import PostgresStore
async def main():
# Production storage — every event persisted with sequential indexing
store = PostgresStore(dsn="postgresql://...")
kit = RoomKit(store=store)
# Voice channel: SIP backend + full audio pipeline
voice = VoiceChannel("voice",
backend=SIPVoiceBackend(SIPConfig(listen_port=5060)),
stt=DeepgramSTTProvider(DeepgramSTTConfig(model="nova-3")),
tts=ElevenLabsTTSProvider(ElevenLabsTTSConfig(voice_id="...")),
pipeline=AudioPipelineConfig(vad=SileroVADProvider()),
)
# SMS channel: fallback when voice drops or for follow-ups
sms = SMSChannel("sms", provider=TwilioSMSProvider(TwilioConfig(
account_sid="...", auth_token="...", from_number="+1...",
)))
# AI channel: Claude as the intelligence layer
ai = AIChannel("ai",
provider=AnthropicProvider(AnthropicConfig(model="claude-sonnet-4-5-20250514")),
system_prompt="You are a patient intake assistant for a telehealth clinic.",
)
kit.register_channel(voice)
kit.register_channel(sms)
kit.register_channel(ai)
# Create room — all channels share the same conversation
room = await kit.create_room(room_id="intake-2026-03-24-001")
await kit.attach_channel(room.room_id, "voice")
await kit.attach_channel(room.room_id, "sms")
await kit.attach_channel(room.room_id, "ai", category=ChannelCategory.INTELLIGENCE)
# Compliance hook: runs on EVERY message across ALL channels
@kit.hook(HookTrigger.BEFORE_BROADCAST)
async def redact_pii(event, ctx):
# Your PII redaction logic here — applies to voice transcripts,
# SMS messages, and AI responses equally
return HookResult.allow()
# Voice-specific hook: send SMS summary after each voice turn
@kit.hook(HookTrigger.ON_TURN_COMPLETE)
async def sms_summary(event, ctx):
await kit.send_direct(room.room_id, "sms",
content=f"Call summary: {event.text}",
participant_id=ctx.participant_id,
)
return HookResult.allow()
What stands out: the voice channel, SMS channel, and AI channel all live in the same room. The BEFORE_BROADCAST hook applies uniformly — voice transcripts, SMS messages, and AI responses all pass through the same compliance pipeline. The ON_TURN_COMPLETE hook sends an SMS summary after each voice exchange. If the voice call drops, the patient can continue the conversation over SMS with full context.
The Voice Pipeline Isn't an Afterthought
Because the WebRTC.ventures article focuses on production voice deployments, it's worth addressing the voice subsystem directly. RoomKit's audio pipeline has 12 processing stages:
Inbound (microphone to STT):
Resampler → Recorder → AEC (Speex) → AGC → Denoiser (RNNoise/GTCRN) → VAD (Silero/TEN-VAD) → Diarization + DTMF
Outbound (TTS to speaker):
PostProcessors → Recorder → AEC reference feed → Resampler
The pipeline is capability-aware: AEC and AGC stages auto-skip when the backend reports NATIVE_AEC or NATIVE_AGC capabilities. Four interruption strategies (IMMEDIATE, CONFIRMED, SEMANTIC, DISABLED) handle barge-in with configurable thresholds. Semantic turn detection uses backchannel analysis to distinguish "uh-huh" from real interruptions.
Voice backends include FastRTC (WebRTC/WebSocket), SIP (full call signaling with codec negotiation), RTP (direct UDP for PBX integration), and local microphone for development. For speech-to-speech scenarios, RoomKit supports 6 realtime AI providers: OpenAI Realtime, Gemini Live, ElevenLabs Conversational AI, xAI Grok, Personaplex, and Anam.
This is a production-grade voice pipeline. The difference is that it lives inside a channel abstraction that coexists with every other communication medium in the same room.
Updated Decision Tree
Extending the WebRTC.ventures article's recommendations:
"I need cloud-native governance with IAM and model guardrails"
→ Bedrock or Vertex. Use their models inside RoomKit's AIChannel if you also need multi-channel.
"I need a WebRTC-first voice agent with a media server"
→ LiveKit Agents. Strongest choice for pure voice with built-in SIP and observability.
"I need the widest ecosystem of voice AI services"
→ Pipecat Flows. 40+ integrations, client SDKs on every platform.
"I need voice + SMS + Email + WhatsApp in one conversation"
→ RoomKit. The only framework where voice is a channel in a multi-channel room.
"I need a unified audit trail across voice, text, and email"
→ RoomKit. PostgresStore indexes every event across every channel with sequential ordering.
"I need compliance hooks that apply to all communication channels"
→ RoomKit. 34+ hook triggers with sync/async execution, applied uniformly across voice, SMS, Email, WhatsApp, and AI.
"I need a hybrid: cloud AI + voice-native RTC + multi-channel"
→ RoomKit with Bedrock/Vertex models as AI providers and FastRTC or SIP as the voice backend. This is the hybrid the WebRTC.ventures article recommends, extended to cover non-voice channels.
Trade-offs
In the spirit of honesty:
- Ecosystem size — Pipecat has 40+ service integrations and client SDKs for React, Swift, Kotlin, and C++. LiveKit has a mature open-source media server battle-tested at scale. RoomKit's ecosystem is younger and narrower. If you only need voice and want the broadest provider choice, Pipecat or LiveKit will get you there faster.
- Managed infrastructure — Bedrock and Vertex offer fully managed cloud infrastructure with enterprise SLAs. RoomKit is a library you deploy yourself. If your team wants a managed service with a console, RoomKit isn't that.
- Visual tooling — LiveKit has a playground and session dashboard. TEN Framework (not in the WebRTC.ventures comparison) has a visual graph designer. RoomKit doesn't have a visual builder — it's code-first.
- WebRTC at scale — LiveKit's media server handles large-scale WebRTC routing and SFU concerns out of the box. RoomKit's FastRTC backend is lighter — it handles WebRTC transport, not a full SFU.
These are real trade-offs. RoomKit wins when the conversation spans multiple channels, when you need unified compliance and audit across all of them, or when voice is one part of a larger multi-channel system. It's not the right tool for every voice AI deployment.
Conclusion
The WebRTC.ventures article draws a useful map of the production voice AI landscape: cloud-native governance on one side, voice-native RTC on the other, hybrid architectures in the middle. All four frameworks are strong choices for their respective use cases.
RoomKit adds a new axis to that map: multi-channel conversation orchestration. Not "voice AI framework that can also do chat" — a conversation framework where voice, SMS, Email, WhatsApp, Telegram, Teams, and AI are all first-class channels in the same room, with unified hooks, permissions, identity resolution, and event storage.
In regulated, mission-critical environments where conversations span channels — telehealth, financial services, contact centers — that's not a nice-to-have. It's the architecture the problem demands.
Further reading: