Metadata-Version: 2.4
Name: llm-in-sandbox
Version: 0.1.0
Summary: A lightweight framework that connects LLMs to a virtual computer (Docker-based sandbox) to build general-purpose agents
Project-URL: Homepage, https://github.com/llm-in-sandbox/llm-in-sandbox
Project-URL: Repository, https://github.com/llm-in-sandbox/llm-in-sandbox
Project-URL: Issues, https://github.com/llm-in-sandbox/llm-in-sandbox/issues
Author: Daixuan Cheng, Shaohan Huang
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: agent,agentic-ai,code-sandbox,instruction-following,llm,long-context,mathematics,reinforcement-learning,sandbox,science
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: docker>=6.0.0
Requires-Dist: fire>=0.5.0
Requires-Dist: litellm>=1.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: rich>=13.0.0
Provides-Extra: dev
Requires-Dist: black>=23.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Description-Content-Type: text/markdown

# LLM-in-Sandbox

A lightweight framework that connects LLMs to a virtual computer (Docker-based sandbox) to build general-purpose agents.

**Features:**
- 🌍 General-purpose: works beyond coding—scientific reasoning, long-cotext understanding, video production, travel planning, and more
- 🐳 Isolated execution environment via Docker containers
- 🔌 Compatible with OpenAI, Anthropic, and self-hosted servers (vLLM, SGLang, etc.)
- 📁 Flexible I/O: mount any input files, export any output files

## Installation

**Requirements:** Python 3.10+, [Docker](https://docs.docker.com/engine/install/)

```bash
git clone https://github.com/llm-in-sandbox/llm-in-sandbox.git
cd llm-in-sandbox
pip install -e .
```

**Docker Image**

The default Docker image (`cdx123/llm-in-sandbox:v0.1`) will be automatically pulled when you first run the agent. The first run may take a minute to download the image (~400MB), but subsequent runs will start instantly. 

<details>
<summary>Advanced: Build your own image</summary>

Modify [Dockerfile](./docker/Dockerfile) and build your own image:

```bash
llm-in-sandbox build
# Then use: --docker_image llm-in-sandbox:v0.1
```

</details>

## Quick Start

LLM-in-Sandbox works with various LLM providers including OpenAI, Anthropic, and self-hosted servers (vLLM, SGLang, etc.).

### Option 1: Cloud / API Services

```bash
llm-in-sandbox run \
    --query "write a hello world in python" \
    --llm_name "openai/gpt-5" \
    --llm_base_url "http://your-api-server/v1" \
    --api_key "your-api-key"
```

### Option 2: Self-Hosted Models

<details>
<summary>Using local vLLM server for Qwen3-Coder-30B-A3B-Instruct</summary>

**1. Start vLLM server:**
```bash
vllm serve Qwen/Qwen3-Coder-30B-A3B-Instruct \
    --served-model-name qwen3_coder \
    --enable-auto-tool-choice \
    --tool-call-parser qwen3_coder \
    --tensor-parallel-size 8
```

**2. Run agent (in a new terminal once server is ready):**
```bash
llm-in-sandbox run \
    --query "write a hello world in python" \
    --llm_name qwen3_coder \
    --llm_base_url "http://localhost:8000/v1"  \
    --temperature 0.7
```

</details>

<details>
<summary>Using local SGLang server for DeepSeek-V3.2-Thinking</summary>

**1. Start sgLang server:**
```bash
python3 -m sglang.launch_server \
    --model-path "deepseek-ai/DeepSeek-V3.2" \
    --served-model-name "DeepSeek-V3.2" \
    --trust-remote-code \
    --tp-size 8 \
    --tool-call-parser deepseekv32 \
    --reasoning-parser deepseek-v3 \
    --host 0.0.0.0 \
    --port 5678
```

**2. Run agent (in a new terminal once server is ready):**
```bash
llm-in-sandbox run \
    --query "write a hello world in python" \
    --llm_name DeepSeek-V3.2 \
    --llm_base_url "http://0.0.0.0:5678/v1" \
    --extra_body '{"chat_template_kwargs": {"thinking": True}}'
```

</details>

### Parameters (Common)

| Parameter | Description | Default |
|-----------|-------------|---------|
| `--query` | Task for the agent | *required* |
| `--llm_name` | Model name | *required* |
| `--llm_base_url` | API endpoint URL | *from LLM_BASE_URL env var* |
| `--api_key` | API key (not needed for local server) | *from OPENAI_API_KEY env var* |
| `--input_dir` | Input files folder to mount (Optional) | *None* |
| `--output_dir` | Output folder for results | `./output` |
| `--docker_image` | Docker image to use | `cdx123/llm-in-sandbox:v0.1` |
| `--prompt_config` | Path to prompt template | `./config/general.yaml` |
| `--temperature` | Sampling temperature | `1.0` |
| `--max_steps` | Max conversation turns | `100` |
| `--extra_body` | Extra JSON body for LLM API calls | *None* |

Run `llm-in-sandbox run --help` for all available parameters.

### Output

Each run creates a timestamped folder:

```
output/2026-01-16_14-30-00/
├── files/
│   ├── answer.txt      # Final answer
│   └── hello_world.py  # Output file
└── trajectory.json     # Execution history
```

## More Examples

We provide examples across diverse non-coding domains: travel planning, video production, music composition, poster design, and more.

👉 See [examples/README.md](./examples/README.md) for the full list.

## Contact Us

Daixuan Cheng: daixuancheng6@gmail.com  
Shaohan Huang: shaohanh@microsoft.com  

## Acknowledgment

We learned the design and reused code from [R2E-Gym](https://github.com/R2E-Gym/R2E-Gym). Thanks for the great work!