Multi-Agent Architecture, Part 1: User Interaction — Every Channel Is an Entry Point

This is Part 1 of a series on multi-agent architecture. Read the series introduction for context on the full architecture, or jump ahead to Part 2: Orchestration.

A user sends a WhatsApp message. Another calls a phone number. A third types into a web chat widget. A fourth replies to an email thread. All four are trying to reach the same system, but each arrives through a completely different protocol, with different payloads, different authentication, and different quirks.

In most codebases, this is where the mess starts. Each channel gets its own webhook handler, its own parsing logic, its own way of identifying the sender. Add a new channel? Copy-paste an existing handler, tweak the field names, cross your fingers. Six months later, you have six slightly-different implementations of the same business logic, and a bug fix in one doesn't propagate to the others.

In a multi-agent architecture, this gets worse. If the first layer of your system -- the user interaction layer -- doesn't normalize inputs, every downstream agent has to understand every channel format. That defeats the purpose of having agents at all.

RoomKit solves this at the foundation. User interaction is the first pillar: every channel produces the same data model, every message enters through the same pipeline, and by the time it reaches an agent, the channel of origin is metadata -- not a branching condition.

The Problem: N Channels, N Codebases

I've seen this pattern in every telecom and messaging system I've worked on. You start with SMS via Twilio. Then you add WhatsApp. Then voice. Then a web widget. Each integration looks something like this:

# Twilio SMS webhook
async def handle_twilio_sms(request):
    body = request.form["Body"]
    sender = request.form["From"]
    signature = request.headers["X-Twilio-Signature"]
    validate_twilio_hmac(signature, request.url, request.form)
    user = lookup_user_by_phone(sender)
    await process_message(user, body, source="sms")

# Telnyx SMS webhook
async def handle_telnyx_sms(request):
    payload = await request.json()
    body = payload["data"]["payload"]["text"]
    sender = payload["data"]["payload"]["from"]["phone_number"]
    verify_telnyx_ed25519(request.headers, await request.body())
    user = lookup_user_by_phone(sender)
    await process_message(user, body, source="sms")

# ...repeat for WhatsApp, Telegram, email, voice, WebSocket

Different field names. Different signature algorithms (Twilio uses HMAC-SHA1, Telnyx uses ED25519). Different JSON structures. But the intent is identical: someone sent a message, verify it's legitimate, figure out who sent it, and process it.

Now multiply this by the number of channels a real system needs to support: SMS, WhatsApp, Telegram, Messenger, Teams, email, WebSocket, HTTP webhooks, voice, realtime voice. That's ten channel types, each with one or more providers. The if-chain approach doesn't scale.

TransportChannel: One Abstraction, Ten Channels

RoomKit's answer is the Channel abstraction. Text-based channels -- SMSChannel, WhatsAppChannel, TelegramChannel, EmailChannel, HTTPChannel, MessengerChannel, TeamsChannel -- are factory functions that return a TransportChannel with the right configuration. Specialized channels -- VoiceChannel, RealtimeVoiceChannel, WebSocketChannel -- are full classes with their own pipeline logic. But they all share the same Channel base interface. When a message arrives from any of them, it's normalized into a single InboundMessage model before anything else happens.

The channel handles provider-specific parsing and signature verification internally. By the time the message reaches your application code, you're working with a clean, uniform object:

from roomkit import RoomKit, SMSChannel, WhatsAppChannel, RealtimeVoiceChannel
from roomkit.providers.twilio import TwilioSMSProvider, TwilioConfig
from roomkit.providers.whatsapp.personal import WhatsAppPersonalProvider
from roomkit.providers.gemini.realtime import GeminiLiveProvider

kit = RoomKit()

# Three channels, three providers, one room
sms = SMSChannel("support-sms", provider=TwilioSMSProvider(
    config=TwilioConfig(
        account_sid="AC...",
        auth_token="tok...",
        from_number="+15551234567",
    )
))
whatsapp = WhatsAppChannel("support-whatsapp", provider=WhatsAppPersonalProvider(
    source=whatsapp_source,
))
voice = RealtimeVoiceChannel("support-voice", provider=GeminiLiveProvider(
    api_key="...",
    model="gemini-2.5-flash-native-audio-preview",
))

kit.register_channel(sms)
kit.register_channel(whatsapp)
kit.register_channel(voice)

# All three channels attach to the same room
await kit.create_room(room_id="support-case-7742")
await kit.attach_channel("support-case-7742", "support-sms")
await kit.attach_channel("support-case-7742", "support-whatsapp")
await kit.attach_channel("support-case-7742", "support-voice")

A customer texting on SMS and another messaging on WhatsApp are now in the same conversation. The room is the convergence point. The channel is just the transport.

InboundRoomRouter: Resolving Where a Message Belongs

When a webhook fires or a WebSocket frame arrives, the system needs to answer one question: which room does this message belong to?

That's the job of the InboundRoomRouter. It receives the channel ID, channel type, and participant ID, and returns a room ID -- or None to let the framework auto-create one. The routing logic is yours to define -- match on phone number, session ID, customer account, or any combination:

from typing import Any
from roomkit import RoomKit
from roomkit.core.inbound_router import InboundRoomRouter
from roomkit.models.enums import ChannelType

class SupportRouter(InboundRoomRouter):
    def __init__(self, db):
        self.db = db

    async def route(
        self,
        channel_id: str,
        channel_type: ChannelType,
        participant_id: str | None = None,
        channel_data: dict[str, Any] | None = None,
    ) -> str | None:
        # Look up existing open case by sender identity
        if participant_id:
            case = await self.db.find_open_case(participant_id)
            if case:
                return case.room_id

        # Return None to let RoomKit auto-create a new room
        return None

# Pass the router when creating the RoomKit instance
kit = RoomKit(inbound_router=SupportRouter(db=case_database))

The router doesn't care whether the message came from SMS, WhatsApp, or a WebSocket. It receives the same channel-agnostic parameters regardless. This is the payoff of input normalization: routing logic is written once and works for every channel.

Request Validation: Trust Nothing

User interaction isn't just about receiving messages. It's about verifying them before they enter your system. RoomKit handles validation at three levels.

Webhook signature verification happens inside the channel provider, before the message is even normalized. Telnyx webhooks are verified with ED25519. Twilio uses HMAC-SHA1. Each provider knows its own signature scheme. If verification fails, the message is rejected at the transport layer -- it never becomes an InboundMessage.

Identity resolution maps raw sender identifiers (phone numbers, email addresses, WebSocket session IDs) to unified participant identities. The same person reaching out on SMS and WhatsApp resolves to a single participant in the room. This happens through a configurable identity resolution pipeline that you can extend with your own lookup logic.

Content filtering via BEFORE_BROADCAST hooks gives you a final gate before a message enters the conversation. This is where you apply business rules -- profanity filters, rate limiting, compliance checks, PII redaction:

from roomkit.models.enums import HookTrigger
from roomkit.models.hook import HookResult
from roomkit.models.context import RoomContext
from roomkit.models.event import RoomEvent

@kit.hook(HookTrigger.BEFORE_BROADCAST)
async def validate_content(event: RoomEvent, context: RoomContext) -> HookResult:
    # Block messages with PII
    if contains_pii(event.content.body):
        return HookResult.block("Message contains personal information")

    # Rate limit: check recent events from this sender
    sender_recent = [
        e for e in context.recent_events
        if e.source and e.source.channel_id == event.source.channel_id
    ]
    if len(sender_recent) > 10:
        return HookResult.block("Rate limit exceeded")

    return HookResult.allow()

The hook runs identically whether the message arrived from a Telegram bot, an email, or a phone call transcription. One validation pipeline, every channel.

The Room as Convergence Point

This is the architectural insight that ties everything together. In RoomKit, the room is not a chat room in the UI sense. It's a conversation context. Multiple channels attach to the same room, which means:

A customer on SMS and a support agent on a web dashboard share the same timeline
An AI agent attached as an intelligence channel sees messages from all transport channels
Hooks apply uniformly -- a content policy violation is caught whether the message came from WhatsApp or email
The event store captures a single, ordered history of the entire conversation across every channel

[SMS] ──┐ [WhatsApp] ──┤ [Voice] ──┼──→ Room ──→ Hooks ──→ Broadcast ──→ [All Channels] [Telegram] ──┤ [WebSocket] ──┘

When a downstream agent needs to respond, it doesn't need to know which channel the user is on. It writes to the room, and RoomKit broadcasts to every attached channel. The agent's job is to think. The room's job is to deliver.

Why This Matters for Multi-Agent Systems

In a multi-agent architecture, the user interaction layer is the foundation everything else depends on. If agents have to parse channel-specific payloads, they become coupled to transport details. If routing is scattered across webhook handlers, adding a new channel means touching every agent. If validation is inconsistent, one channel becomes a backdoor.

RoomKit's user interaction layer guarantees three things to every downstream component:

Normalized input: every message is an InboundMessage with a consistent schema, regardless of origin
Verified identity: webhook signatures are checked, sender identities are resolved, before the message reaches any agent
Validated content: hooks have already applied your business rules by the time an agent processes the message

This is what makes the orchestration layer (Part 2) possible. Agents don't negotiate with channels. They receive clean, validated, routed messages and focus on their actual job.

This article is part of a 9-part series on production-ready multi-agent architecture. Next up: Part 2: Orchestration.

Series: Introduction · Part 1: User Interaction · Part 2: Orchestration · Part 3: Knowledge · Part 4: Storage · Part 5: Agents · Part 6: Integration · Part 7: External Tools · Part 8: Observability · Part 9: Evaluation