Metadata-Version: 2.4
Name: deepagents-sandbox
Version: 0.0.2
Summary: Linux-native sandboxed code execution for agent workflows using bubblewrap and cgroups
Project-URL: Repository, https://github.com/john221wick/sandy
Author: Sandy Contributors
License-Expression: MIT
Keywords: bubblewrap,code-execution,isolation,langchain,sandbox
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Security
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.10
Provides-Extra: dev
Requires-Dist: deepagents>=0.4; extra == 'dev'
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Provides-Extra: langchain
Requires-Dist: deepagents>=0.4; extra == 'langchain'
Description-Content-Type: text/markdown

# deepagents_sandbox

A small Linux sandbox for native sandbox support for agent workflows in the same spirit as OpenAI Agents SDK. On Linux it uses `bubblewrap` for isolation and `cgroups v2` for limits, similar to Codex's open-source Linux sandbox approach. No containers, no VMs, just Linux primitives.

**Status:** Alpha. Works on Linux with cgroup v2. Tested on Ubuntu 22.04 and 24.04.

## Goal of this project

The goal is simple:

- let an LLM or agent run code without giving it your whole machine
- keep its work inside `/workspace`
- block obvious bad behavior like reading host files, using the network, or spawning too many processes
- stay small enough that you can use it directly as a Python package

## How this works

When you create a sandbox, Sandy:

1. makes a fresh temporary workspace
2. starts `bubblewrap` with isolated namespaces
3. mounts only a small filesystem view inside the sandbox
4. makes `/workspace` and `/tmp` writable
5. keeps the network off by default
6. applies memory and PID limits with cgroups when the environment allows it
7. deletes the temporary workspace when the sandbox closes

This is broadly similar to the sandboxing approach OpenAI has described for Codex. OpenAI’s public Codex materials say tasks run in isolated cloud sandboxes/containers, and the open-source Codex Linux sandbox docs say `bubblewrap` is the default filesystem sandbox on Linux. Sandy is not the same implementation, but it follows the same general idea: isolate execution, keep the filesystem tight, and only expose the paths the tool actually needs. Sources: [OpenAI, Introducing Codex](https://openai.com/index/introducing-codex/) and [openai/codex Linux sandbox README](https://github.com/openai/codex/blob/main/codex-rs/linux-sandbox/README.md).

One important detail: if you keep using the same sandbox instance, the same `/workspace` stays there between commands. If you create a new sandbox instance, you get a fresh workspace.

## Simplest example

This is the smallest useful example:

```python
import asyncio
from deepagents_sandbox import NativeSandbox

async def main():
    async with NativeSandbox() as sandbox:
        await sandbox.execute("printf 'print(1 / 0)\n' > /workspace/main.py")
        print((await sandbox.execute("python3 /workspace/main.py")).stderr.strip())

        await sandbox.execute("printf 'print(1 + 1)\n' > /workspace/main.py")
        print((await sandbox.execute("python3 /workspace/main.py")).stdout.strip())

asyncio.run(main())
```

The same sandbox instance keeps `/workspace/main.py` between those commands.

## More complete example

```python
import asyncio
from deepagents_sandbox import NativeSandbox, SandboxConfig

async def main():
    config = SandboxConfig(
        memory_limit_mb=512,
        max_pids=256,
        timeout_seconds=30,
    )

    async with NativeSandbox(config) as sandbox:
        # run a command
        result = await sandbox.execute("echo hello from the sandbox")
        print(result.stdout)   # hello from the sandbox
        print(result.exit_code)  # 0

        # upload files into the sandbox workspace
        await sandbox.upload_files([
            ("script.sh", b"#!/bin/sh\nwhoami && ls -la"),
        ])

        # run the uploaded script
        result = await sandbox.execute("sh /workspace/script.sh")
        print(result.stdout)

        # download files from the workspace
        downloads = await sandbox.download_files(["output.txt"])
        print(downloads[0].content.decode())

asyncio.run(main())
```

## What you get

- **Process isolation** via bubblewrap's PID, user, mount, network, and IPC namespaces
- **Resource limits** via cgroups v2: memory cap, PID ceiling, CPU weight
- **Network isolation** by default (opt-in with `network_access=True`)
- **Filesystem sandbox** — only the workspace directory is writable; `/usr`, `/bin`, `/lib`, `/lib64` are read-only bind mounts
- **Timeout enforcement** — commands that run too long are killed
- **Output size limits** — stdout/stderr truncated at 256KB to prevent log exhaustion
- **Graceful degradation** — if cgroups aren't available, deepagents_sandbox warns and runs without resource limits

## Requirements

- Linux (x86-64 or ARM64)
- cgroup v2 (standard on modern Linux distros)
- bubblewrap (`apt install bubblewrap` on Debian/Ubuntu)
- Python 3.10+

If you're developing on macOS, run deepagents_sandbox inside Docker or a Linux VM. `bubblewrap` and cgroup v2 are Linux-only. Some Docker environments expose cgroup v2 but do not delegate writable controllers; in that case deepagents_sandbox warns and runs without memory/PID/CPU limits.

## Installation

```bash
pip install deepagents_sandbox
```

For Deep Agents / LangChain usage:

```bash
pip install "deepagents_sandbox[langchain]"
```

For development:

```bash
git clone https://github.com/john221wick/sandy.git
cd sandy
pip install -e ".[dev]"
```

## Configuration

`SandboxConfig` is a frozen dataclass — pass it to `NativeSandbox` at construction:

```python
config = SandboxConfig(
    memory_limit_mb=512,      # max RAM (default: 512MB)
    max_pids=256,             # max processes (default: 256)
    cpu_shares=100,           # CPU weight (default: 100)
    timeout_seconds=60.0,     # hard timeout (default: 60s)
    max_output_bytes=262144,  # stdout/stderr cap (default: 256KB)
    network_access=False,     # allow outbound network (default: False)
    gpu=False,                # expose GPU (default: False, reserved for v2)
    extra_bind_mounts=[],     # list of (host_path, sandbox_path) tuples
    extra_env={},             # extra environment variables
)
```

## Deep Agents adapter

Sandy exposes a Deep Agents backend directly from the package root:

```python
from deepagents import create_deep_agent
from deepagents_sandbox import Sandbox

backend = Sandbox()
agent = create_deep_agent(
    model="openai:gpt-4.1-mini",
    backend=backend,
)

result = agent.invoke(
    {
        "messages": [
            {
                "role": "user",
                "content": "Write /workspace/hello.py, then run it.",
            }
        ]
    }
)

backend.close()
```

Notes:

- This targets Deep Agents specifically, not bare `ChatModel.invoke(...)`.
- Use absolute paths under `/workspace`.
- `/tmp/...` is supported for backend temp-file flows used by Deep Agents.
- The adapter assumes it is running on Linux or inside Docker where `bubblewrap` works.

## Tests included

The test suite is split into three parts:

- `unit`
  - workspace creation, read, write, list, snapshot, restore, and cleanup
  - path traversal checks like `../../etc/passwd`
  - symlink escape checks
  - executor command validation, mount flags, env flags, and network flags
  - timeout handling
  - cgroup slice creation, config writing, and PID attachment
  - Linux and `bwrap` prerequisite detection
  - `NativeSandbox` lifecycle and timeout forwarding
  - Deep Agents adapter behavior for `/workspace`, `/tmp`, invalid paths, and error mapping
- `integration`
  - real command execution through `bubblewrap`
  - current working directory is `/workspace`
  - network is blocked by default
  - system paths are read-only
  - `/workspace` is writable
  - file upload, execute, and download flows work
  - timeout handling on real commands
  - Deep Agents adapter can run commands and move files through `/workspace` and `/tmp`
- `adversarial`
  - fork bomb containment
  - memory bomb containment when cgroup memory limits are available
  - blocked network access with `curl` and DNS lookups
  - host file access checks like `/etc/shadow`
  - path traversal attempts from inside the workspace
  - symlink escape attempts
  - blocked privilege escalation with `sudo` and `su`

Run them like this:

On macOS or Windows, you can run the unit tests (no bwrap required):

```bash
make setup
make unit
```

To run the full test suite including integration and adversarial tests on macOS, use Docker:

```bash
docker build -t deepagents_sandbox-test .
docker run --rm --privileged --cgroupns=private deepagents_sandbox-test
```

To run just the Deep Agents adapter tests in Docker:

```bash
docker build -t deepagents_sandbox-test .
docker run --rm --privileged --cgroupns=private deepagents_sandbox-test \
  pytest -v --tb=short tests/unit/test_langchain_adapter.py tests/integration/test_langchain_adapter.py
```

The container must run with `--privileged --cgroupns=private` so bubblewrap and cgroups work inside the container.

Or use the Makefile targets directly:

```bash
make lint          # ruff
make typecheck     # mypy
make test          # pytest (all tests)
make unit          # pytest -m "not integration and not adversarial"
make integration   # pytest -m integration
make adversarial   # pytest -m adversarial
```

## Security properties

The sandbox limits what a compromised or malicious command can do:

- Fork bomb (`:(){ :|:& };:`): PID limit via cgroups `pids.max`
- Memory exhaustion: memory limit via cgroups `memory.max`
- Network exfiltration: `--unshare-net` by default
- Read host files like `/etc/shadow`: read-only filesystem, only `/workspace` writable
- Path traversal like `../../etc/passwd`: workspace-relative path enforcement
- Privilege escalation with `sudo` or `su`: dropped capabilities, user namespace isolation, and synthetic passwd/group files

**Caveats:** This is not a hard security boundary like a VM or a rootless container. It's designed to catch accidental mistakes and naive adversarial prompts. A sufficiently motivated attacker with kernel access or sufficient privileges can escape it. Use appropriately.

## Project layout

```
deepagents_sandbox/
  detect.py    # prerequisite checks (bwrap, cgroup v2, user namespaces)
  workspace.py # temp directory with snapshot/restore
  cgroup.py    # cgroup v2 slice creation and cleanup
  executor.py  # bubblewrap subprocess management
  config.py    # SandboxConfig dataclass
  sandbox.py   # NativeSandbox (async context manager)
  __init__.py  # public API exports

tests/
  unit/        # mocked tests, run on any OS
  integration/ # real bwrap execution tests
  adversarial/ # escape attempt tests
```

## License

MIT
