Why Sandboxing Matters for AI Agents

AI agents aren’t just chatbots — they execute code, call APIs, read and write files, and interact with databases. When an agent is compromised (via prompt injection, tool poisoning, or memory manipulation), the blast radius is determined by what the agent can access. Sandboxing limits that blast radius.

Without it, a single prompt injection could give an attacker full access to your production environment. A malicious prompt could instruct your coding agent to read ~/.ssh/id_rsa, exfiltrate database credentials, or install a reverse shell — all while appearing to “help” with a legitimate task.

The solution is execution isolation: run every agent action inside a constrained environment where the worst-case outcome is a wasted sandbox, not a breached network.

The 5 Tools at a Glance

Tool Isolation Startup Security Cost Best For
Docker + gVisor Container + syscall filter 1–3s Strong Free (OSS) Self-hosted, CI/CD
Firecracker MicroVM (hardware) 125ms Very Strong Free (OSS) Multi-tenant, serverless
E2B Cloud VM sandboxes ~300ms Strong ~$0.36/hr Coding agents, notebooks
Modal Container + runtime ~50ms warm Good Pay-per-second ML inference, GPU
Fly Machines Firecracker microVMs ~300ms Strong $0.003/hr Edge, geo-distributed

Tool-by-Tool Breakdown

1
Docker + gVisor
Isolation
Container + Syscall Filter
Startup
1–3 seconds
Cost
Free (open source)

gVisor acts as a user-space kernel that intercepts all syscalls from the container. Instead of your agent talking directly to the host kernel, every system call goes through gVisor’s runsc runtime, which implements a subset of Linux syscalls in a sandbox. This dramatically reduces the kernel attack surface compared to standard Docker.

Self-hosted agents CI/CD pipelines Open source No vendor lock-in
Limitation: No GPU passthrough with gVisor. Larger attack surface than microVMs since it still shares the host kernel (though heavily filtered).
2
Firecracker (AWS)
Isolation
MicroVM (Hardware)
Startup
125ms
Cost
Free (OSS) / AWS Lambda

Firecracker creates lightweight virtual machines that provide the same hardware-level isolation as traditional VMs but boot in 125 milliseconds with a minimal memory footprint. Each microVM gets its own kernel, so a compromised agent cannot reach the host kernel at all. This is the technology behind AWS Lambda and Fargate.

Multi-tenant execution Serverless backends Hardware isolation 125ms boot
Limitation: Linux-only, limited device support, and requires KVM. You need bare-metal or nested-virt-enabled cloud instances.
3
E2B (e2b.dev)
Isolation
Cloud VM Sandboxes
Startup
~300ms
Cost
$0.0001/sec (~$0.36/hr)

E2B provides on-demand cloud sandboxes purpose-built for AI agents. Each sandbox is a full Linux environment with a persistent filesystem, so your agent can install packages, write files, and run long-lived processes. The SDK integrates directly with LangChain, CrewAI, and other agent frameworks.

AI coding agents Notebook execution Persistent FS Agent SDK
Limitation: Cloud-only (no self-hosted option), vendor lock-in, and limited region availability.
4
Modal
Isolation
Container + Custom Runtime
Startup
~50ms warm / ~1s cold
Cost
$0.000017/sec (CPU)

Modal is a cloud compute platform designed for ML workloads. Its container-based isolation with namespace separation provides good security, and its killer feature is GPU access with near-instant warm starts. If your agents need to run inference or fine-tuning inside the sandbox, Modal is the only option here that makes that practical.

ML inference GPU workloads Batch processing 50ms warm start
Limitation: Python-centric ecosystem. Requires restructuring your agent code to use Modal’s decorator-based orchestration pattern.
5
Fly Machines
Isolation
Firecracker MicroVMs
Startup
~300ms (from stopped)
Cost
$0.003/hr (shared CPU)

Fly Machines run on the same Firecracker technology as AWS Lambda but with a key difference: you get 30+ global regions out of the box. Each machine is a microVM that can be started, stopped, and destroyed via API. For agents that need to operate close to users or data sources across geographies, Fly is the most practical choice.

Edge deployment Geo-distributed agents 30+ regions API-driven
Limitation: Limited GPU availability and a smaller ecosystem compared to AWS. GPU instances are only available in select regions.
Quick Pick: Docker + gVisor for Most Teams

If you’re not sure where to start, go with Docker + gVisor. It’s free, runs on your existing infrastructure, requires no vendor account, and provides strong isolation for the vast majority of agent workloads. You can always graduate to Firecracker or a managed service when you need multi-tenant isolation or sub-second startup times.

Decision Matrix

Need maximum security? Firecracker
Need GPU access? Modal
Need a persistent environment? E2B
Need edge deployment? Fly Machines
Need zero vendor lock-in? Docker + gVisor
Budget-constrained? Docker + gVisor (free) or Fly Machines

Implementation Best Practices

Regardless of which tool you choose, these six principles should govern every sandboxed agent deployment:

  • 1
    Principle of least privilege Each agent gets only the permissions it needs. If an agent only reads from an API, it should not have write credentials. If it only needs network access to one endpoint, firewall everything else.
  • 2
    Network isolation Agents cannot reach internal services unless explicitly allowed. Default-deny network policies mean a compromised sandbox cannot scan your internal network or reach metadata endpoints.
  • 3
    Time limits Kill sandboxes after a maximum execution time. This prevents crypto mining, persistent backdoors, and runaway processes. A 5-minute hard limit covers most agent tasks.
  • 4
    Resource caps Limit CPU, memory, and disk to prevent resource exhaustion attacks. A sandbox that can allocate unlimited memory is a denial-of-service vector against your host.
  • 5
    Audit logging Log all syscalls and network requests from sandboxed agents. When (not if) something goes wrong, you need a forensic trail. gVisor and Firecracker both support detailed audit logs.
  • 6
    Secret injection Pass secrets at runtime, never bake them into sandbox images. Use short-lived tokens with automatic rotation. If a sandbox image is cached or leaked, no credentials are exposed.

What We Build in the Workshop

In Module 4 of our AI Security Workshop, you will actually set up and compare Docker+gVisor and Firecracker sandboxes, then try to break out of them in a hands-on red team exercise. You will:

Configure gVisor’s runsc runtime with custom seccomp profiles, deploy a Firecracker microVM with a minimal guest kernel, attempt container escapes and privilege escalation from inside each sandbox, and measure the performance overhead of each isolation layer under realistic agent workloads.

By the end of the module, you will have a production-ready sandbox configuration that you can drop into any agent framework — and the confidence that comes from having tried to break it yourself.