Multi-Agent Architecture, Part 5: Agents — Supervisors, Specialists, and Delegation

This is Part 5 of a series on multi-agent architecture. Read the series introduction for context on the full architecture, or go back to Part 4: Storage.

This is the pillar everyone wants to start with. Multiple AI agents with different roles, coordinating in real time, each bringing specialized expertise. The pitch is compelling, and the demos look impressive. Then you try to build it for real.

The questions pile up fast. How do you scope an agent's responsibility? How does one agent hand off to another without losing context? What happens when two agents want to respond at the same time? How do you run a background task without blocking the main conversation? And how do you mix providers — Claude for reasoning, Gemini for vision, Mistral for fast classification — without your codebase turning into a provider-specific mess?

These aren't hypothetical problems. I hit every one of them while building production multi-agent workflows with RoomKit. The solutions came from a single architectural decision: an agent is just a channel. Specifically, an AIChannel attached to a room. This sounds reductive, but it turns out to be exactly the right abstraction. It gives you isolation, composability, provider independence, and a natural coordination boundary — all from the same room model that handles user interaction.

AIChannel: The Execution Unit

In RoomKit, every agent is an AIChannel. Each channel carries its own configuration: the LLM provider, the system prompt, generation parameters, and tool definitions. When a message arrives in the room, the orchestration layer (covered in Part 2) decides which channels receive it. The active channel generates a response. Non-active channels are muted.

This is a critical distinction. Muting does not mean the agent is shut down. A muted channel suppresses response events — it won't produce visible output — but it still preserves side effects. Tasks continue running, observations are recorded, and the agent stays aware of the conversation. It's the difference between an agent that's silent and an agent that doesn't exist.

Here's what a multi-agent room looks like in practice — a customer support system with a supervisor, a billing specialist, and a technical specialist:

from roomkit import RoomKit, AIChannel
from roomkit.providers.anthropic import AnthropicAIProvider, AnthropicConfig
from roomkit.providers.openai import OpenAIAIProvider, OpenAIConfig
from roomkit.providers.gemini import GeminiAIProvider, GeminiConfig
from roomkit.orchestration import ConversationRouter

kit = RoomKit()

# Supervisor: Claude for complex reasoning and oversight
supervisor = AIChannel("supervisor",
    provider=AnthropicAIProvider(config=AnthropicConfig(
        model="claude-sonnet-4-20250514",
    )),
    system_prompt=(
        "You are a support supervisor. You observe all exchanges "
        "between customers and specialist agents. Intervene only when "
        "an agent gives incorrect information, when the customer is "
        "frustrated, or when escalation is needed. Otherwise, stay silent."
    ),
    temperature=0.3,
    max_tokens=1024,
)

# Billing specialist: GPT-4o for structured data extraction
billing = AIChannel("billing-agent",
    provider=OpenAIAIProvider(config=OpenAIConfig(model="gpt-4o")),
    system_prompt=(
        "You are a billing specialist. Handle invoice inquiries, "
        "payment issues, refund requests, and subscription changes. "
        "Always verify the customer's account before making changes."
    ),
    temperature=0.1,
    max_tokens=2048,
)

# Technical specialist: Gemini for multimodal support (screenshots)
tech = AIChannel("tech-agent",
    provider=GeminiAIProvider(config=GeminiConfig(model="gemini-2.5-pro")),
    system_prompt=(
        "You are a technical support specialist. Diagnose bugs, "
        "walk users through configuration steps, and analyze "
        "screenshots or error logs they share."
    ),
    temperature=0.2,
    max_tokens=4096,
    thinking_budget=2048,
)

kit.register_channel(supervisor)
kit.register_channel(billing)
kit.register_channel(tech)

# Configure orchestration with supervisor
router = ConversationRouter(supervisor_id="supervisor")

# Create the room with orchestration
await kit.create_room(
    room_id="support-room-001",
    orchestration=router,
)
await kit.attach_channel("support-room-001", "supervisor")
await kit.attach_channel("support-room-001", "billing-agent")
await kit.attach_channel("support-room-001", "tech-agent")

Three agents, three providers, one room. Each agent has a scoped responsibility, its own system prompt, and independent generation parameters. The supervisor uses lower temperature for precise oversight. The tech agent gets a thinking budget for complex diagnostic reasoning. The billing agent uses minimal temperature for deterministic data operations.

The Supervisor Pattern

The supervisor_id parameter on the room is deceptively simple, but it changes everything about how multi-agent coordination works. A supervisor channel always receives events, regardless of routing. When the orchestrator routes a message to the billing agent, the supervisor still sees it. When the billing agent responds, the supervisor sees that too.

This is not the same as making the supervisor the "active" agent. The supervisor is muted during normal operation — it observes but doesn't produce responses. But because muting preserves side effects, the supervisor can still take action when it detects a problem: trigger an escalation, log an observation, or unmute itself to intervene directly.

The architecture looks like this:

Customer Message
       │
       ▼
  Orchestrator
       │
       ├── routing ──▶ [Billing Agent]   ✓ active, responds
       │
       ├── always ───▶ [Supervisor]      👁 observes, can intervene
       │
       └── filtered ─▶ [Tech Agent]      ✗ not dispatched, idle

The supervisor pattern solves a real problem: in any multi-agent system, you need a single point of accountability. Without it, agents operate in silos and nobody catches cross-agent mistakes. The supervisor is that accountability layer, and because it's just another AIChannel, it has its own prompt, provider, and tools — it can be as sophisticated or as lightweight as your use case demands.

Tools: Giving Agents Capabilities

An agent without tools is a chatbot. Tools are what turn an LLM into an execution unit — the ability to look up an account, process a refund, query a database, or call an external API. In RoomKit, tools follow a simple protocol: a class with a .definition property (the JSON Schema the LLM sees) and an async .handler() method (the code that runs).

The framework handles the function-calling protocol for each provider — Anthropic's tool_use, OpenAI's function calling, Gemini's function declarations, Mistral's tool calls — so your tool implementation is provider-agnostic:

import json

class LookupInvoice:
    """Tool protocol: .definition property + async .handler() method."""

    @property
    def definition(self) -> dict:
        return {
            "name": "lookup_invoice",
            "description": "Look up an invoice by ID",
            "parameters": {
                "type": "object",
                "properties": {
                    "invoice_id": {"type": "string"}
                },
                "required": ["invoice_id"],
            },
        }

    async def handler(self, name: str, arguments: dict) -> str:
        invoice = await db.get_invoice(arguments["invoice_id"])
        return json.dumps({
            "id": invoice.id,
            "amount": str(invoice.amount),
            "status": invoice.status,
        })

class ProcessRefund:
    @property
    def definition(self) -> dict:
        return {
            "name": "process_refund",
            "description": "Process a refund for an invoice",
            "parameters": {
                "type": "object",
                "properties": {
                    "invoice_id": {"type": "string"},
                    "reason": {"type": "string"},
                },
                "required": ["invoice_id", "reason"],
            },
        }

    async def handler(self, name: str, arguments: dict) -> str:
        result = await payments.refund(
            invoice_id=arguments["invoice_id"],
            reason=arguments["reason"],
        )
        return json.dumps({"refund_id": result.id, "status": result.status})

# Pass tool instances directly — definition and handler are extracted automatically
billing = AIChannel("billing-agent",
    provider=OpenAIAIProvider(config=OpenAIConfig(model="gpt-4o")),
    system_prompt="You are a billing specialist...",
    tools=[LookupInvoice(), ProcessRefund()],
)

The billing agent gets lookup_invoice and process_refund. The tech agent would get query_logs and run_diagnostic. The supervisor might get escalate_to_human and flag_for_review. Each agent's tool set defines its capability boundary — and that boundary is enforced by the framework, not by prompt instructions the LLM might ignore.

Delegation: Background Task Execution

Not every task belongs in the main conversation. A customer asks "Can you generate a detailed usage report for the last 90 days?" That could take thirty seconds to compile. You don't want the conversation blocked while the billing agent crunches numbers.

RoomKit solves this with the DelegateHandler. Delegation creates a child room linked to the parent, shares the relevant channels (same provider instance, different binding), and executes the task independently. The parent room continues handling the conversation while the child works in the background.

from roomkit import AIChannel
from roomkit.models.enums import HookTrigger

# Register a report-generation agent channel
report_agent = AIChannel("report-generator",
    provider=AnthropicAIProvider(config=AnthropicConfig(
        model="claude-sonnet-4-20250514",
    )),
    system_prompt=(
        "You generate usage reports. Query the database for the "
        "requested time period, format the data, and return a summary."
    ),
    tools=report_tools,  # list of Tool objects
)
kit.register_channel(report_agent)

# Fire-and-forget: delegate and continue the conversation
task = await kit.delegate(
    room_id="support-room-001",
    agent_id="report-generator",
    task="Generate a usage report for account ACC-7742 covering the last 90 days",
)
# Parent room is NOT blocked — conversation continues immediately

# Or: blocking delegation with a timeout
task = await kit.delegate(
    room_id="support-room-001",
    agent_id="report-generator",
    task="Look up the current balance for account ACC-7742",
)
result = await task.wait(timeout=30.0)  # blocks until complete or timeout

The child room shares channels from the parent — same provider instance, different binding. This means the delegated agent has access to the same tools and context as the parent, but operates in its own execution scope. When the task completes, the result flows back through the delegation lifecycle hooks.

Two hooks give you visibility into the delegation lifecycle:

ON_TASK_DELEGATED — fires when a task is dispatched to a child room. Use it for logging, metrics, or notifying the user that background work has started.
ON_TASK_COMPLETED — fires when the delegated agent finishes. The result is available on the event, and you can use it to inform the parent conversation.

from roomkit.models.event import RoomEvent
from roomkit.models.context import RoomContext

@kit.hook(HookTrigger.ON_TASK_COMPLETED)
async def handle_report_ready(event: RoomEvent, context: RoomContext) -> None:
    # Log the completed task for downstream processing
    print(f"Task completed in room {context.room.id}: {event.content.body}")

@kit.hook(HookTrigger.ON_TASK_DELEGATED)
async def handle_task_started(event: RoomEvent, context: RoomContext) -> None:
    print(f"Background task started in room {context.room.id}")

This model is fundamentally different from "agent calls agent" patterns where one LLM makes an API call to another LLM. In RoomKit, delegation is a first-class operation with lifecycle management, timeout handling, and hook-based observability. The parent room doesn't lose track of what it spawned.

Multi-Provider Support

One of the practical benefits of the AIChannel model is that provider selection is per-agent, not per-system. You're not locked into a single LLM provider for your entire application. Each agent uses whichever provider best fits its role:

Anthropic Claude — reasoning-heavy tasks, supervisor oversight, nuanced conversation
OpenAI GPT-4o — structured data extraction, function calling, broad general knowledge
Google Gemini — multimodal inputs (image analysis, screenshot parsing), large context windows
Mistral AI — fast classification, lightweight triage, cost-effective high-throughput tasks

All four providers support function calling and vision through RoomKit's unified interface. You write your tool handler once, and it works regardless of which provider powers the agent. The framework translates between each provider's native tool-calling format — Anthropic's tool_use blocks, OpenAI's function definitions, Gemini's functionDeclarations, Mistral's tool calls — transparently.

This means you can swap providers per agent without touching your tool code, your routing logic, or your hooks. If Claude Sonnet handles your supervisor today but you want to try Gemini 2.5 Pro next week, you change one line — the provider constructor — and everything else stays the same.

Why Agents Come Fifth

I put agents at pillar five deliberately. Notice what we needed before we could get here: user interaction to normalize inputs, orchestration to route messages between agents, knowledge to give agents context, and storage to persist state across turns. Without those four pillars, agents are just isolated LLM calls with no coordination, no memory, and no way to reach users.

With them in place, the agent layer becomes straightforward. An AIChannel is a configuration object — a system prompt, a provider, some tools, and generation parameters. The hard work of routing, state management, and user delivery is handled by the infrastructure underneath. The agent's only job is to think.

And that's exactly right. An agent should be the simplest part of your multi-agent system. If your agent code is complex, you're probably solving infrastructure problems inside the agent instead of in the framework where they belong.

This article is part of a 9-part series on production-ready multi-agent architecture. Next up: Part 6: Integration.

Series: Introduction · Part 1: User Interaction · Part 2: Orchestration · Part 3: Knowledge · Part 4: Storage · Part 5: Agents · Part 6: Integration · Part 7: External Tools · Part 8: Observability · Part 9: Evaluation