Introducing roomkit-sandbox: Secure Sandboxed Execution for AI Agents

roomkit-sandbox is a container-based sandbox executor for RoomKit. It gives AI agents secure, isolated environments to read and write files, run git commands, execute shell scripts, and review code, without exposing your host system. It's now available on PyPI as v0.2.1.

The Problem

AI agents that can only talk are limited. The moment you want an agent to clone a repo, run tests, review a diff, or edit a file, you need to give it shell access. But running arbitrary commands on your host is a non-starter in production. You need isolation, resource limits, and a clean interface between the agent and the execution environment.

That's what roomkit-sandbox provides: a bridge between RoomKit's AI channels and isolated execution environments, with 10 built-in tools that agents can call naturally.

10 Built-In Tools

When you attach a sandbox to an agent, it automatically gets these tools in its catalog:

sandbox_read     Read file contents (with optional line ranges)
sandbox_write    Write or create files
sandbox_edit     Replace text in files
sandbox_ls       List directory contents with metadata
sandbox_grep     Search files with regex patterns
sandbox_find     Find files by name or type
sandbox_git      Execute any git command
sandbox_diff     Compare two files
sandbox_delete   Delete files or directories
sandbox_bash     Execute shell commands

No tool registration, no schema boilerplate. Attach the executor, and the agent sees them.

Three Isolation Backends

Different deployments need different isolation levels. roomkit-sandbox supports three backends through a protocol-based interface:

Docker (Development & CI)

Container-level isolation with ~500ms first boot. Per-user persistent containers, configurable memory and CPU limits, and instant reuse after startup. The default choice for local development and CI pipelines.

from roomkit_sandbox import ContainerSandboxExecutor
from roomkit_sandbox.docker_backend import DockerSandboxBackend

sandbox = ContainerSandboxExecutor(
    backend=DockerSandboxBackend(
        image="ghcr.io/roomkit-live/sandbox:latest",
        memory_limit="512m",
        cpu_count=1,
    ),
    session_id="code-reviewer",
    setup_commands=[
        "git clone https://github.com/org/repo.git /workspace/repo",
    ],
)

Kubernetes (Production)

Pod-level isolation with namespace separation, service accounts, and image pull secrets. Lightweight pods (512Mi, 1 CPU) with label-based discovery for session reuse. The right choice when you're scaling agents in production.

from roomkit_sandbox.k8s_backend import KubernetesSandboxBackend

sandbox = ContainerSandboxExecutor(
    backend=KubernetesSandboxBackend(
        image="ghcr.io/roomkit-live/sandbox:latest",
        namespace="production",
        service_account="sandbox-runner",
    ),
)

SmolBSD (VM Isolation, Experimental)

True VM-level isolation using QEMU/KVM with a lightweight NetBSD-based microVM. Kernel-level separation means a hypervisor escape is required for breakout. The option for running untrusted code where container isolation isn't enough.

Token-Optimized Output with RTK

LLM context is expensive. Every byte of command output that goes back to the agent costs tokens. roomkit-sandbox uses RTK, a token-optimized CLI for LLMs, as its default command builder for Docker and Kubernetes backends.

Instead of dumping raw cat and grep output, RTK produces structured, compact results that reduce token usage by 60-90%. For long-running tasks or large codebases, this translates directly to lower API costs.

# Standard output: verbose, token-heavy
sandbox_bash "make test"  # → raw test output (thousands of tokens)

# RTK output: summarized, token-efficient
sandbox_bash "make test"  # → rtk summary (structured result, 60-90% fewer tokens)

The command builder is pluggable. You can swap in a NativeCommandBuilder for standard POSIX output, or subclass CommandBuilder to inject project-specific tools.

Container Lifecycle

Containers are created on first use and reused across calls within a session. The lifecycle is automatic:

First call: creates the container, runs setup commands (git clone, pip install, etc.), caches the container ID
Subsequent calls: reuses the cached container instantly
Container crash: automatically creates a new container on the next call

Label-based discovery means the executor can find existing containers even after process restarts, avoiding startup overhead.

Integration with RoomKit

roomkit-sandbox implements RoomKit's SandboxExecutor interface. Pass it to an Agent and the sandbox tools appear in the agent's tool catalog automatically:

from roomkit import Agent
from roomkit_sandbox import ContainerSandboxExecutor
from roomkit_sandbox.docker_backend import DockerSandboxBackend

agent = Agent(
    name="code-reviewer",
    provider=anthropic.Anthropic(),
    sandbox=ContainerSandboxExecutor(
        backend=DockerSandboxBackend(),
        session_id="review-session",
        setup_commands=[
            "git clone https://github.com/org/repo.git /workspace/repo",
        ],
    ),
)

# The agent now has sandbox_read, sandbox_write, sandbox_git,
# sandbox_bash, and all other sandbox tools available.

The Container Image

The sandbox image (ghcr.io/roomkit-live/sandbox:latest) is 37MB, built on Alpine 3.21. It includes RTK, bash, git, curl, jq, and openssh-client. Containers run as a non-root sandbox user (UID 1000) with /workspace as the working directory.

Getting Started

pip install roomkit-sandbox[docker]     # Docker backend
pip install roomkit-sandbox[kubernetes] # Kubernetes backend

The project is open source under the MIT license. Check out the GitHub repository for documentation, examples, and the full API reference.

This is v0.2.1, still in alpha. The core executor, Docker and Kubernetes backends, and RTK integration are solid and tested. SmolBSD support is experimental. We'd love your feedback.