Multi-Agent Architecture, Part 7: External Tools — Connecting Agents to the Real World

This is Part 7 of a series on multi-agent architecture. Read the series introduction for the full roadmap, Part 6: Integration for tool access control, or skip ahead to Part 8: Observability.

An agent that can only talk is a chatbot. An agent that can do things — look up a customer record, send an SMS confirmation, trigger a webhook to your billing system, initiate a voice call — that's an agent worth deploying.

But "connecting to external services" sounds deceptively simple. In practice, every external system comes with its own authentication, payload format, rate limits, error modes, and retry semantics. If your agents interact with the outside world through ad-hoc HTTP calls scattered across your codebase, you end up with the same N-integrations mess we saw with channels in Part 1, except now it's on the output side.

External tools are the seventh pillar of multi-agent architecture. In Part 6, we covered tool access control — deciding which agent can use which tool, and under what constraints. This article covers the next layer: the actual execution environment. How do you connect agents to SMS providers, voice APIs, CRM systems, search engines, and arbitrary business databases in a way that's pluggable, testable, and doesn't become a maintenance nightmare?

The Provider Ecosystem

RoomKit doesn't ship a single SMS integration or a single voice integration. It ships a provider interface for each capability, with multiple implementations behind it. The same channel type — say, SMSChannel — can be backed by Twilio, Telnyx, Sinch, or VoiceMeUp. Swap the provider, keep the channel behavior.

Here's what the provider landscape looks like across the major categories:

Messaging: SMS via Twilio, Telnyx, Sinch, and VoiceMeUp (all with delivery tracking, webhook handling, and MMS support). RCS via Twilio RCS and Telnyx RCS. WhatsApp via the WhatsApp Business API (with a mock provider available for testing). Facebook Messenger. Email via ElasticEmail, with SendGrid scaffolded for drop-in support.

Voice: Streaming speech-to-text via Deepgram. Text-to-speech via ElevenLabs. Local STT/TTS via Sherpa-ONNX for on-premise or air-gapped deployments. Voice backends via FastRTC (WebSocket + VAD), RTP, and SIP. Realtime speech-to-speech via Gemini Live and OpenAI Realtime.

LLM providers: Anthropic Claude, OpenAI, Google Gemini, and Mistral AI — each exposed as an intelligence channel that plugs into a room.

Webhooks: HTTPChannel for delivering structured payloads to any external endpoint.

The key architectural decision: every provider implements the same interface for its category. A TwilioSMSProvider and a TelnyxSMSProvider both conform to the SMS provider contract. This means your room configuration, routing logic, and agent code never reference a specific vendor. You can switch from Twilio to Telnyx by changing one line of configuration, and your agents don't know the difference.

Wiring It Together: Multiple Providers in One Room

Real systems rarely use just one external service. A support room might need SMS for customer notifications, a voice channel for live calls, and an AI channel for the agent brain. Here's how that looks in practice:

from roomkit import RoomKit, SMSChannel, AIChannel
from roomkit.channels import VoiceChannel
from roomkit.providers.twilio import TwilioSMSProvider, TwilioConfig
from roomkit.voice.stt.deepgram import DeepgramSTTProvider, DeepgramConfig
from roomkit.voice.tts.elevenlabs import ElevenLabsTTSProvider, ElevenLabsConfig
from roomkit.providers.anthropic import AnthropicAIProvider, AnthropicConfig
from roomkit.voice.backends.fastrtc import FastRTCVoiceBackend

kit = RoomKit()

# SMS — outbound confirmations and inbound customer messages
sms = SMSChannel("notify-sms", provider=TwilioSMSProvider(
    config=TwilioConfig(
        account_sid=os.environ["TWILIO_SID"],
        auth_token=os.environ["TWILIO_TOKEN"],
        from_number="+15559876543",
    )
))

# Voice — live calls with streaming STT/TTS
voice = VoiceChannel("live-voice",
    stt=DeepgramSTTProvider(config=DeepgramConfig(
        api_key=os.environ["DG_KEY"],
    )),
    tts=ElevenLabsTTSProvider(config=ElevenLabsConfig(
        api_key=os.environ["EL_KEY"],
        voice_id="rachel",
    )),
    backend=FastRTCVoiceBackend(),
)

# AI — the agent brain
ai = AIChannel("support-agent",
    provider=AnthropicAIProvider(config=AnthropicConfig(
        model="claude-sonnet-4-20250514",
    )),
    system_prompt="You are a support agent for Acme Corp...",
)

# Register and attach everything to one room
for ch in (sms, voice, ai):
    kit.register_channel(ch)

await kit.create_room(room_id="case-0421")
for ch in (sms, voice, ai):
    await kit.attach_channel("case-0421", ch.channel_id)

Three providers, three protocols, one room. The AI agent hears everything — the SMS messages, the voice transcriptions — and can respond through any attached channel. When a customer calls in and the agent needs to send a confirmation code, it writes to the SMS channel. No separate integration code. No provider-specific logic in the agent.

Custom Tools: Reaching Any External System

Providers cover the communication channels, but agents also need to interact with systems that aren't channels at all: CRM lookups, database queries, search engines, SaaS platform APIs, internal microservices. This is where RoomKit's Tool protocol comes in.

A tool is a class with a .definition property (the JSON Schema the LLM sees) and an async .handler() method (the code that runs). The framework extracts both automatically — no separate tool_handler function, no split between schema and implementation.

import json
import httpx

class LookupCustomer:
    """Tool protocol: .definition + .handler()"""

    @property
    def definition(self) -> dict:
        return {
            "name": "lookup_customer",
            "description": "Search the CRM for a customer by email or phone number.",
            "parameters": {
                "type": "object",
                "properties": {
                    "query": {"type": "string", "description": "Email address or phone number"}
                },
                "required": ["query"],
            },
        }

    async def handler(self, name: str, arguments: dict) -> str:
        async with httpx.AsyncClient() as client:
            resp = await client.get(
                "https://crm.internal/api/v2/customers",
                params={"q": arguments["query"]},
                headers={"Authorization": f"Bearer {os.environ['CRM_TOKEN']}"},
                timeout=5.0,
            )
            resp.raise_for_status()
            customers = resp.json()["results"]

        if not customers:
            return json.dumps({"found": False})

        customer = customers[0]
        return json.dumps({
            "found": True,
            "name": customer["name"],
            "plan": customer["subscription_plan"],
            "open_tickets": customer["open_ticket_count"],
        })

# Pass tool instances — definition and handler are extracted automatically
ai = AIChannel("support-agent",
    provider=AnthropicAIProvider(config=AnthropicConfig(
        model="claude-sonnet-4-20250514",
    )),
    system_prompt="You are a support agent. Use lookup_customer to find customer records before answering account questions.",
    tools=[LookupCustomer()],
)

The LLM decides when to call the tool based on the conversation context. When a customer asks "what plan am I on?", the agent calls lookup_customer with their email, gets the CRM data, and responds with specific account details. No hallucination, no guesswork — the agent retrieved real data from a real system.

This pattern extends to anything you can call from Python: database queries with SQLAlchemy, Elasticsearch searches, Stripe billing lookups, Jira ticket creation, Slack notifications. The tool handler is your escape hatch from the LLM sandbox into the real world. It's an async function returning a JSON string — what happens inside is entirely up to you.

HTTPChannel: Webhooks to Anywhere

Sometimes you need the reverse direction: instead of an agent calling out to an external system, you want the room's events to flow into an external system automatically. That's what HTTPChannel does. It attaches to a room like any other channel, but instead of connecting to a user, it delivers structured payloads to an HTTP endpoint.

from roomkit import HTTPChannel

# HTTPChannel is a factory that creates a webhook transport channel.
# The provider handles the actual HTTP delivery.
analytics = HTTPChannel("analytics-hook", provider=webhook_provider)

kit.register_channel(analytics)
await kit.attach_channel("case-0421", "analytics-hook")

# Now every event broadcast in room "case-0421"
# is delivered to the webhook provider as structured JSON.

This is powerful for integration patterns where the external system is the consumer, not the provider. Feed conversation transcripts into a data warehouse. Push agent decisions to a compliance audit log. Send real-time events to a Slack channel for human oversight. The room broadcasts to HTTPChannel just like it broadcasts to SMS or WhatsApp — it's the same mechanism, aimed at a different target.

Pluggable by Design

The thread running through all of this is pluggability. Every provider conforms to an interface. Every tool handler follows the same signature. Every channel — whether it's delivering SMS, voice, or webhook payloads — implements the same lifecycle.

This matters for three reasons that hit hard in production:

Testing. WhatsApp has a mock provider. You can run your entire multi-agent workflow against mock SMS, mock voice, and mock CRM endpoints without touching a real API. Your CI pipeline tests the full conversation flow, not just the LLM prompts. When something breaks, you know whether it's your logic or the provider's API.

Migration. When you need to switch from Twilio to Telnyx for cost reasons, you change the provider instance. The room configuration, the agent logic, the hooks, the tool handlers — none of them change. I've done this migration in production. It took one line of code and a config update.

Composition. Because providers and tool handlers share the same registration pattern, you can compose them freely. One room can have Twilio SMS for US numbers, Sinch for European numbers, Deepgram for STT, Sherpa-ONNX for on-premise voice, Claude for complex reasoning, and Gemini for fast classification — all coordinated by the same orchestration logic from Part 2.

Common Mistakes

From working with teams building on RoomKit, I see the same mistakes with external tools:

Putting provider logic in tool handlers. If your tool handler is constructing Twilio API calls directly, you've bypassed the provider abstraction. Use the channel to send messages; use tool handlers for non-channel systems like CRMs and databases.

No timeouts on external calls. A tool handler that hangs for 30 seconds waiting on a slow API will stall the entire agent turn. Set explicit timeouts. Return a structured error when a service is unavailable. The agent can tell the user "I couldn't reach the billing system right now" — that's better than silence.

Returning raw API responses. The LLM doesn't need a 200-field JSON blob from your CRM. Filter the response in the tool handler to only the fields the agent needs. Less data means fewer tokens, faster inference, and less chance the agent latches onto an irrelevant field.

Skipping the mock providers in tests. If your test suite hits real Twilio or real Deepgram, your tests are slow, flaky, and expensive. Use the mock providers. Test the conversation flow, not the HTTP client.

Why This Matters for Multi-Agent Systems

In a single-agent system, external tool integration is already tricky. In a multi-agent system, it's critical infrastructure. Multiple agents might need the same CRM data but at different permission levels (covered in Part 6). Tool calls need to be traced across agent boundaries (covered in Part 8). And the tool execution environment needs to be reliable enough that agents can depend on it without defensive retry logic in every prompt.

RoomKit's external tools layer gives you three guarantees:

Provider-agnostic channels: swap SMS, voice, email, or LLM providers without touching agent code
Structured tool execution: custom handlers with a consistent async interface, JSON in and JSON out, for any external system your agents need to reach
Bidirectional integration: agents call out via tool handlers, and room events flow out via HTTPChannel — both directions covered by the same pluggable architecture

Your agents need to do things in the real world. The external tools layer is how they do it without your codebase turning into a pile of provider-specific glue code.

This article is part of a 9-part series on production-ready multi-agent architecture. Next up: Part 8: Observability.

Series: Introduction · Part 1: User Interaction · Part 2: Orchestration · Part 3: Knowledge · Part 4: Storage · Part 5: Agents · Part 6: Integration · Part 7: External Tools · Part 8: Observability · Part 9: Evaluation