Sandbox¶
For the three filesystem-mode comparison see Filesystem. This page focuses on sandbox mode usage.
What sandbox solves¶
Confines the agent’s file operations and command execution to an isolated environment; the host stays untouched. Plus three extra wins:
Execution boundary — untrusted input, suspicious scripts,
rm -rf-shaped commands all stay inside the sandbox.Cross-call recovery — not just conversation state:
pip install,npm install, generated temp files (the executable environment itself) are snapshotted, so the nextcall()resumes in the same sandbox without reinstalling.Multi-replica friendly — when multiple replicas serve the same logical user, sandbox state can share a single slot so any node can resume the same workspace.
A minimal example¶
Local Docker, isolated per conversation:
HarnessAgent agent = HarnessAgent.builder()
.name("code-agent")
.model(model)
.workspace(workspace)
.filesystem(new DockerFilesystemSpec()
.image("ubuntu:24.04"))
.build();
agent.call(msg, RuntimeContext.builder()
.sessionId("user-1-conv-1")
.build()).block();
Different sessionId → different sandbox; same sessionId across call() → automatically reuses the same sandbox (or restores from snapshot).
Cross-call recovery = snapshots¶
The sandbox snapshots its workspace at each call() end and restores at the next start:
Container still alive + workspace still there → just continue (fastest)
Container gone → reboot from snapshot, restore workspace
No snapshot → full init from
WorkspaceSpec(cold start)
Where snapshots land is decided by snapshotSpec:
Option |
When |
|---|---|
|
No persistence; cold start when the container is gone |
|
Host local file (single-machine long-running) |
|
OSS / S3-compatible (multi-replica) |
|
Redis (low latency, small workspaces) |
.filesystem(new DockerFilesystemSpec()
.image("ubuntu:24.04")
.snapshotSpec(new OssSnapshotSpec(ossClient, "my-bucket", "agentscope/")))
Host-side workspace files (AGENTS.md / skills/ / subagents/ / knowledge/) are synced into the sandbox at each start, content-hash-gated. So if you edit a script under skills/, the next call() has the new version inside the sandbox.
Distributed deployment¶
When multiple replicas run the same agent and any replica must be able to pick up the same user’s conversation, you need:
A distributed
Session(e.g. a Redis-backed implementation)A non-
Noopsnapshot (OSS / Redis / remote store)An appropriate
IsolationScope(USER/AGENT/GLOBAL)
To declare these together, use sandboxDistributed(...):
HarnessAgent.builder()
.name("assistant")
.model(model)
.filesystem(new DockerFilesystemSpec()
.image("ubuntu:24.04")
.isolationScope(IsolationScope.USER))
.sandboxDistributed(SandboxDistributedOptions.oss(redisSession, ossSnapshotSpec))
.build();
With requireDistributed=true, build fails fast if the actual Session / snapshot don’t qualify — catching misconfig before deployment.
Concurrency control (multi-replica)¶
In USER / AGENT / GLOBAL modes across replicas, two replicas serving the same user concurrently both write to the same slot — last writer wins. If that’s not OK, add a distributed lock. Redis-backed implementation built in:
SandboxExecutionGuard guard = RedisSandboxExecutionGuard.builder(jedis)
.leaseTtl(Duration.ofMinutes(30)) // a bit larger than worst-case call duration
.retryInterval(Duration.ofMillis(500))
.build();
.filesystem(new DockerFilesystemSpec()
.image("ubuntu:24.04")
.isolationScope(IsolationScope.USER)
.snapshotSpec(redisSnapshotSpec)
.executionGuard(guard))
The lock key is bucketed by scope automatically (USER → by userId, AGENT → by agent name).
You can also implement the SandboxExecutionGuard interface to plug in other lock backends (DB / Zookeeper / etcd).
Self-managed sandbox instances (advanced)¶
By default the framework owns the whole sandbox lifecycle. Three “I’ll manage it myself” scenarios:
1. I already have a running container; I want the agent to use it
Sandbox mySandbox = dockerClient.create(workspaceSpec, snapshotSpec, options);
mySandbox.start();
SandboxContext callCtx = SandboxContext.builder()
.client(dockerClient)
.externalSandbox(mySandbox) // framework only stops() at end of call, doesn't shutdown()
.build();
agent.call(msgs, RuntimeContext.builder()
.sessionId("my-session")
.sandboxContext(callCtx)
.build()).block();
// shut it down yourself when done
mySandbox.shutdown();
2. I have a specific snapshot string; restore to that moment
SandboxState savedState = dockerClient.deserializeState(savedStateJson);
SandboxContext callCtx = SandboxContext.builder()
.client(dockerClient)
.externalSandboxState(savedState) // framework restores from this state but owns the lifecycle
.build();
3. Multiple agents share one sandbox
Pass the same externalSandbox to each agent’s call(), then shutdown() it yourself when done.
Choosing a sandbox backend¶
Backend |
Best for |
|---|---|
Docker |
Local dev / single machine / trusted shell |
Kubernetes |
Self-hosted K8s, node-level bind mounts |
Daytona |
Generic managed sandbox HTTP API |
E2B |
Generic managed sandbox + native platform snapshots |
AgentRun |
Aliyun-managed sandbox (Function Compute FC 3.0); per-instance NAS / OSS auto-mount; mainland-China low latency. Treated as a regular |
All backends implement the same interface; agent code, toolkit, and AGENTS.md don’t change.
How the workspace maps into the sandbox¶
Host-side key files under workspace/ (AGENTS.md, skills/, subagents/, knowledge/) are synced into the sandbox at each start, content-hash-gated — unchanged content is skipped.
To bind a host directory into the sandbox (e.g. a code repo), use BindMountEntry (only Docker / K8s; managed sandboxes like Daytona / E2B run in the cloud and can’t mount your host paths).
File changes inside the sandbox don’t sync back to the host — to retrieve sandbox-produced artifacts, have the agent read_file them.
Implementing your own sandbox backend¶
To integrate a non-Docker isolation environment (self-hosted remote executor, commercial sandbox API, local mock, etc.), no Harness source changes needed — implement a few contract interfaces and pass them to filesystem(...). The InMemorySandbox family under agentscope-harness tests is the minimal skeleton to copy.