Back to blog

Multi-Agent Architecture, Part 3: Knowledge — Giving Agents the Context They Need

March 21, 2026 · 8 min read

An agent without context is just an expensive random text generator. It can reason, it can follow instructions, it can call tools — but if it doesn't know what the user said three turns ago, or what the previous agent already tried, or what your product documentation says about the edge case it's facing, none of that capability matters.

The naive approach is to dump everything into the prompt. Full conversation history, all retrieved documents, every piece of metadata you have. It works for demos. It does not work in production, where conversations run hundreds of turns, context windows have hard limits, and LLM costs scale linearly with token count.

Knowledge — the third pillar of multi-agent architecture — is about solving this systematically: getting the right information to the right agent at the right time, without blowing up your context window or your invoice.


The Four Sources of Agent Knowledge

When I mapped out where agents get their context from in real systems, four categories kept appearing:

  1. Conversation history — what was said in this session, by whom, and when. The most obvious source, and the one most people get wrong by including too much or too little.
  2. Source databases — structured data your agents need to answer questions. Customer records, order status, account settings. This is the "look it up" knowledge.
  3. Document stores — unstructured content like product docs, policy manuals, troubleshooting guides. The domain knowledge that makes an agent actually useful instead of generically helpful.
  4. Vector databases — the retrieval layer that connects natural language queries to relevant chunks of documents or past interactions. This is where RAG lives.

Each source has different latency characteristics, different freshness requirements, and different relevance signals. A good knowledge layer doesn't treat them uniformly — it orchestrates retrieval across all four and assembles the result into a coherent context for the LLM.

MemoryProvider: The Retrieval Interface

RoomKit's answer to context retrieval is the MemoryProvider ABC. Every AI channel has a memory provider that controls exactly what conversation history and context the LLM sees. The interface is deliberately minimal — one method to implement:

from roomkit.memory import MemoryProvider, MemoryResult

class MyMemoryProvider(MemoryProvider):
    async def retrieve(
        self,
        room_id: str,
        current_event: RoomEvent,
        context: RoomContext,
        *,
        channel_id: str | None = None,
    ) -> MemoryResult:
        # Return the events that the LLM should see
        ...

That's the contract. Whatever you return from retrieve() becomes the conversation history the LLM reasons over. The method receives the current event, the room context, and optionally the channel ID — giving you full information to make retrieval decisions. This is where you make the engineering choices that matter: how far back to look, what to summarize, what to retrieve from external sources, what to skip entirely.

SlidingWindowMemory: The Sensible Default

Out of the box, RoomKit ships SlidingWindowMemory as the default provider. It does exactly what the name suggests: returns the last N events from the conversation timeline, where N defaults to 50.

from roomkit import AIChannel
from roomkit.memory import SlidingWindowMemory
from roomkit.providers.anthropic import AnthropicAIProvider, AnthropicConfig

# Default: last 50 events
agent = AIChannel("support-agent",
    provider=AnthropicAIProvider(config=AnthropicConfig(
        model="claude-sonnet-4-20250514",
    )),
    system_prompt="You are a customer support agent.",
)

# Custom window size for a long-running research agent
researcher = AIChannel("research-agent",
    provider=AnthropicAIProvider(config=AnthropicConfig(
        model="claude-sonnet-4-20250514",
    )),
    system_prompt="You are a research assistant.",
    memory=SlidingWindowMemory(max_events=200),
)

This works well for most conversational use cases. The sliding window is predictable, cheap, and fast — it's just a slice of the event store. But it has obvious limitations: it forgets anything older than the window, it doesn't prioritize important events over filler, and it knows nothing about external documents.

That's intentional. SlidingWindowMemory is the starting point, not the ceiling. When you need more, you implement your own MemoryProvider.

RetrievalMemory: Built-In Knowledge Integration

For most use cases, you don't need to implement MemoryProvider from scratch. RoomKit ships RetrievalMemory — a provider that wraps an inner memory (like SlidingWindowMemory) and automatically searches one or more KnowledgeSource backends on every AI turn. Conversation history comes from the inner provider. Relevant knowledge comes from the sources. Both are assembled into a single context.

The built-in PostgresKnowledgeSource uses PostgreSQL's full-text search (tsvector with ts_rank_cd) — keyword-based retrieval that runs on the same database as your PostgresStore. No separate vector service, no extra infrastructure.

from roomkit.memory import RetrievalMemory, SlidingWindowMemory
from roomkit.knowledge import KnowledgeSource, KnowledgeResult
from roomkit.knowledge.postgres import PostgresKnowledgeSource
from roomkit import AIChannel

# PostgresKnowledgeSource — full-text search on the same DB as PostgresStore
pg_knowledge = PostgresKnowledgeSource(
    dsn="postgresql://user:pass@localhost:5432/roomkit",
)
await pg_knowledge.init()  # creates knowledge_documents table + GIN index

# Index some documents (e.g., during startup or ingestion pipeline)
await pg_knowledge.index(
    "Refunds are available within 30 days of purchase. Contact support.",
    metadata={"source": "refund_policy"},
)
await pg_knowledge.index(
    "Enterprise plans include priority support and a dedicated account manager.",
    metadata={"source": "pricing_guide"},
)

# RetrievalMemory wraps SlidingWindowMemory + knowledge sources
memory = RetrievalMemory(
    sources=[pg_knowledge],
    inner=SlidingWindowMemory(max_events=50),
    max_results=5,
)

# Wire it to an AI channel
agent = AIChannel("support-agent",
    provider=anthropic_provider,
    system_prompt="You are a customer support agent.",
    memory=memory,
)

On every AI turn, RetrievalMemory extracts the user's message, queries all sources concurrently, deduplicates results by content, and prepends the top matches as context before the conversation history. The agent sees [Relevant context from knowledge sources] followed by the conversation — no explicit RAG plumbing in your agent code.

The KnowledgeSource interface is pluggable. PostgresKnowledgeSource handles keyword search. For semantic search, implement the same interface backed by pgvector, Pinecone, or any vector store. You can pass multiple sources to RetrievalMemory — results from all of them are merged and ranked.

# Custom knowledge source — same interface, any backend
class PgVectorSource(KnowledgeSource):
    async def search(self, query: str, *, room_id: str | None = None,
                      limit: int = 5) -> list[KnowledgeResult]:
        embedding = await get_embedding(query)
        rows = await self.pool.fetch(
            "SELECT content, 1 - (embedding <=> $1) AS score FROM docs ORDER BY embedding <=> $1 LIMIT $2",
            str(embedding), limit,
        )
        return [KnowledgeResult(content=r["content"], score=r["score"], source="pgvector") for r in rows]

# Combine keyword + semantic search in one memory provider
memory = RetrievalMemory(
    sources=[pg_knowledge, PgVectorSource(pool=shared_pool)],
    inner=SlidingWindowMemory(max_events=50),
)

Both sources run concurrently. Results are merged, deduplicated, and ranked. The agent gets the best of keyword and semantic search without knowing either exists.

HandoffMemoryProvider: Context Across Agent Boundaries

Multi-agent systems have a context problem that single-agent systems don't: when Agent A hands off to Agent B, what does Agent B know? If the answer is "nothing," the user has to repeat themselves. If the answer is "everything Agent A saw," you're back to the "dump it all in" problem.

RoomKit ships HandoffMemoryProvider to handle this explicitly. It wraps an inner MemoryProvider and, when a handoff has occurred, injects context from the previous agent — what was discussed, what was decided, why the handoff happened — into the receiving agent's context window.

from roomkit import AIChannel
from roomkit.memory import SlidingWindowMemory
from roomkit.orchestration import HandoffMemoryProvider
from roomkit.providers.openai import OpenAIAIProvider, OpenAIConfig

# Triage agent that classifies and routes
triage = AIChannel("triage",
    provider=OpenAIAIProvider(config=OpenAIConfig(model="gpt-4o")),
    system_prompt="Classify the customer issue and hand off to the right specialist.",
)

# Billing specialist receives handoff context automatically
# HandoffMemoryProvider wraps an inner provider (here: SlidingWindowMemory)
billing = AIChannel("billing",
    provider=OpenAIAIProvider(config=OpenAIConfig(model="gpt-4o")),
    system_prompt="You are a billing specialist. Resolve payment issues.",
    memory=HandoffMemoryProvider(inner=SlidingWindowMemory()),
)

# Technical support specialist with the same pattern
tech = AIChannel("tech-support",
    provider=OpenAIAIProvider(config=OpenAIConfig(model="gpt-4o")),
    system_prompt="You are a technical support engineer.",
    memory=HandoffMemoryProvider(inner=SlidingWindowMemory()),
)

When the triage agent hands off to billing, the HandoffMemoryProvider automatically includes the triage conversation as context, layered on top of the inner provider's normal retrieval. The billing agent sees what the customer already explained and what the triage agent concluded — no repetition, no context loss. The handoff metadata (reason, source agent, timestamp) is available as structured data, not buried in free-text conversation history.

SkillRegistry: Structured Knowledge Packaging

Conversation history and retrieved documents cover dynamic knowledge — things that change per session or per query. But agents also need static knowledge: how to perform specific tasks, reference documentation for complex procedures, decision trees for troubleshooting flows.

RoomKit's SkillRegistry packages this kind of knowledge into discrete skills, each defined as a directory with its own instructions and reference documents. Instead of stuffing everything into the system prompt upfront, skills are loaded when the agent needs them.

# Skills are directories on disk with a standard structure:
#   skills/refund_processing/
#     instructions.md     <-- loaded as the skill's instructions
#     refund_policy.md    <-- reference file, loaded on demand
#     decision_tree.md    <-- reference file, loaded on demand
#
#   skills/account_recovery/
#     instructions.md
#     identity_verification.md

from roomkit.skills import SkillRegistry
from roomkit import AIChannel
from roomkit.providers.anthropic import AnthropicAIProvider, AnthropicConfig

registry = SkillRegistry()

# Register skills by pointing to their directories
registry.register("skills/refund_processing")
registry.register("skills/account_recovery")

# Attach skills to an AI channel
support = AIChannel("support",
    provider=AnthropicAIProvider(config=AnthropicConfig(
        model="claude-sonnet-4-20250514",
    )),
    system_prompt="You are a customer support agent.",
    skills=registry,
)

The read_skill_reference() tool is automatically available to the agent, letting it load reference documents on demand. The agent doesn't start with all documentation in its context — it calls the tool when it needs a specific reference. This keeps the initial prompt lean and the context window focused on the current task.

The skill model also enforces structure. Instead of a blob of instructions, each skill has a name, instructions, and reference files organized in a directory. When you audit what an agent knows how to do, the registry gives you a concrete inventory — not a 2,000-line system prompt to reverse-engineer.

From MemoryResult to AIContext

All of these pieces — memory providers, handoff context, skill references — converge into the final AIContext that gets sent to the LLM. Understanding this assembly is key to building agents that perform well without burning through tokens.

The flow looks like this:

MemoryProvider.retrieve() --> MemoryResult (messages + events) | v AIChannel converts events to AIMessages | v SkillRegistry adds tool definitions | v AIContext --> LLM Provider --> Response

The MemoryProvider runs first, returning a MemoryResult with pre-built messages and raw events. The AI channel converts the raw events into AIMessage objects (preserving vision support and content types), prepends the pre-built messages, and adds skill tools to the available tool set. The final AIContext — messages, system prompt, tool definitions — is then passed to the LLM provider for inference.

This pipeline is explicit. There is no hidden context injection, no magical prompt augmentation happening behind the scenes. If you want to change what the LLM sees, implement a MemoryProvider. The architecture makes the knowledge flow auditable.

Designing Your Knowledge Architecture

The mistake I see most often is treating knowledge as an afterthought — bolting on RAG after the agent is already built, or cramming everything into the system prompt because "it works for now." Here's the framework I use when designing the knowledge layer for a new multi-agent system:

  1. Start with SlidingWindowMemory. It's simple, predictable, and good enough for initial development. Don't optimize retrieval before you know what your agents actually need.
  2. Measure what gets truncated. When conversations start exceeding the window, look at what's being lost. Is it important? If so, that's your signal to implement a custom MemoryProvider with summarization or selective retrieval.
  3. Separate static from dynamic knowledge. Policy documents and procedures belong in skills with reference files. Conversation-specific information belongs in the memory provider. Mixing them leads to either stale context or bloated prompts.
  4. Make handoff context explicit. When agents transfer control, define exactly what context transfers with them. The HandoffMemoryProvider handles the mechanics, but you need to decide the policy.
  5. Add vector search last. RAG is powerful but adds latency and complexity. Use it when keyword matching and structured queries aren't sufficient — not as a default for every retrieval.

Why This Matters for Multi-Agent Systems

In a single-agent system, knowledge management is a prompt engineering problem. In a multi-agent system, it's an architecture problem. Each agent has different knowledge needs, different context budgets, and different relationships to the conversation history. A triage agent needs to see everything. A specialist agent needs depth in one domain. A summarization agent needs to read the full history and produce a compressed version for the next agent in the chain.

RoomKit's pluggable MemoryProvider architecture lets you make these decisions per agent, without any of them knowing about each other's retrieval strategies. The triage agent can use SlidingWindowMemory with a large window. The specialist can use a custom provider that queries a domain-specific vector database. The summarizer can read the full event store and produce a condensed context. Each agent gets exactly the knowledge it needs, nothing more.

That's the difference between "agents with access to information" and "agents with the right information at the right time." The first is easy. The second is what makes multi-agent systems actually work.


This article is part of a 9-part series on production-ready multi-agent architecture. Next up: Part 4: Storage.

Series: Introduction · Part 1: User Interaction · Part 2: Orchestration · Part 3: Knowledge · Part 4: Storage · Part 5: Agents · Part 6: Integration · Part 7: External Tools · Part 8: Observability · Part 9: Evaluation