Metadata-Version: 2.4
Name: agent-sandbox-runtime
Version: 1.0.0
Summary: Production-grade self-correcting AI agent platform with sandboxed execution
Project-URL: Homepage, https://github.com/ixchio/agent-sandbox-runtime
Project-URL: Documentation, https://github.com/ixchio/agent-sandbox-runtime#readme
Project-URL: Repository, https://github.com/ixchio/agent-sandbox-runtime.git
Project-URL: Issues, https://github.com/ixchio/agent-sandbox-runtime/issues
Project-URL: Changelog, https://github.com/ixchio/agent-sandbox-runtime/releases
Author-email: ixchio <amankumarpandeyin@gmail.com>
Maintainer: Agent Sandbox Runtime Contributors
License: MIT
License-File: LICENSE
Keywords: ai-agents,code-execution,docker,langgraph,llm,sandbox,self-correcting
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Requires-Dist: aiofiles>=24.1.0
Requires-Dist: anthropic>=0.39.0
Requires-Dist: docker>=7.1.0
Requires-Dist: fastapi>=0.115.0
Requires-Dist: google-generativeai>=0.8.0
Requires-Dist: groq>=0.11.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: langchain>=0.3.0
Requires-Dist: langgraph>=0.2.0
Requires-Dist: openai>=1.50.0
Requires-Dist: pydantic-settings>=2.5.0
Requires-Dist: pydantic>=2.9.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: qdrant-client>=1.12.0
Requires-Dist: redis>=5.2.0
Requires-Dist: rich>=13.9.0
Requires-Dist: structlog>=24.4.0
Requires-Dist: typer>=0.12.0
Requires-Dist: uvicorn[standard]>=0.32.0
Requires-Dist: websockets>=13.0
Provides-Extra: dev
Requires-Dist: mypy>=1.13.0; extra == 'dev'
Requires-Dist: pre-commit>=4.0.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24.0; extra == 'dev'
Requires-Dist: pytest-cov>=5.0.0; extra == 'dev'
Requires-Dist: pytest-docker>=3.1.0; extra == 'dev'
Requires-Dist: pytest>=8.3.0; extra == 'dev'
Requires-Dist: ruff>=0.7.0; extra == 'dev'
Description-Content-Type: text/markdown

<div align="center">

# 🧠 Agent Sandbox Runtime

### The Self-Correcting AI Agent with Swarm Intelligence

*An open-source, production-grade AI agent platform that writes code, executes it safely, learns from failures, and self-corrects until it works.*

[![MIT License](https://img.shields.io/badge/License-MIT-green.svg)](https://opensource.org/licenses/MIT)
[![CI](https://github.com/ixchio/agent-sandbox-runtime/actions/workflows/ci.yml/badge.svg)](https://github.com/ixchio/agent-sandbox-runtime/actions/workflows/ci.yml)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![Benchmark](https://img.shields.io/badge/Success%20Rate-92%25-brightgreen.svg)](#-benchmark-results)
[![PRs Welcome](https://img.shields.io/badge/PRs-welcome-brightgreen.svg)](CONTRIBUTING.md)
[![Docker](https://img.shields.io/badge/Docker-Ready-blue?logo=docker)](https://ghcr.io/ixchio/agent-sandbox-runtime)
[![LangGraph](https://img.shields.io/badge/Built%20with-LangGraph-orange)](https://github.com/langchain-ai/langgraph)

<br/>

### 🎬 See it in action

| Swarm Intelligence Activating | Parallel Code Generation |
|:-----------------------------:|:------------------------:|
| ![Swarm Init](docs/screenshots/demo_1_swarm_init.png) | ![Code Gen](docs/screenshots/demo_2_code_generation.png) |

| Generated Solution | Mission Accomplished 🏆 |
|:------------------:|:-----------------------:|
| ![Solution](docs/screenshots/demo_3_solution.png) | ![Result](docs/screenshots/demo_4_result.png) |

<br/>

### 📺 Video Demo

<!-- 🎥 ADD YOUR VIDEO HERE - Replace the link below with your YouTube/Loom video -->
<!-- Option 1: YouTube thumbnail that links to video -->
[![Watch Demo](https://img.shields.io/badge/▶️_Watch_Demo-YouTube-red?style=for-the-badge&logo=youtube)](https://youtu.be/9x3v3XjQHbQ)

<!-- Option 2: If you record a GIF, uncomment and use this instead -->
<!-- ![Demo GIF](docs/demos/demo.gif) -->

<br/>

[📖 Documentation](docs/) · [🚀 Quick Start](#-quick-start) · [🏗️ Architecture](#-system-architecture) · [🤝 Contributing](CONTRIBUTING.md)

<br/>

### ⚡ One-Click Deploy

[![Deploy on Railway](https://railway.app/button.svg)](https://railway.app/template/YOUR_TEMPLATE_ID?referralCode=YOUR_CODE)
[![Deploy to Render](https://render.com/images/deploy-to-render-button.svg)](https://render.com/deploy?repo=https://github.com/ixchio/agent-sandbox-runtime)

</div>

---

## 🎯 Why This Exists

Most AI coding assistants generate code and hope it works. **Agent Sandbox Runtime** takes a fundamentally different approach:

```
You describe what you want → Agent writes code → Executes in Docker sandbox → 
If it fails → Analyzes the error → Rewrites with improvements → Repeats until success
```

This is **Reflexion** - the same self-improvement loop that makes humans good at coding. Combined with **Swarm Intelligence** (5 specialist AI agents reviewing each solution), you get code that actually works.

**Real-world problems this solves:**
- 🔄 **"The AI gave me broken code"** — Self-correction fixes bugs automatically
- 🔒 **"I can't run untrusted code"** — Docker isolation makes it safe
- 🐌 **"AI suggestions are slow"** — Groq inference at 743ms average
- 💸 **"AI APIs are expensive"** — Free tier models supported (Ollama, OpenRouter)

---

## 🏗️ System Architecture

### The Reflexion Loop

This is the core innovation. Instead of generating code once, we generate → test → improve:

```
                    ┌─────────────────────────────────────────────────┐
                    │           REFLEXION LOOP (LangGraph)            │
                    │                                                 │
     Your Task ───► │  ┌──────────┐    ┌─────────┐    ┌─────────┐   │
                    │  │ GENERATE │───►│ EXECUTE │───►│ SUCCESS │───┼──► Result
                    │  │  (LLM)   │    │(Docker) │    │    ?    │   │
                    │  └──────────┘    └─────────┘    └────┬────┘   │
                    │       ▲                              │        │
                    │       │         ┌───────────┐        │ No     │
                    │       │         │  CRITIQUE │◄───────┘        │
                    │       │         │  (LLM)    │                 │
                    │       │         └─────┬─────┘                 │
                    │       │               │                       │
                    │       │         ┌─────▼─────┐                 │
                    │       └─────────┤   RETRY   │                 │
                    │                 │ (≤3 times)│                 │
                    │                 └───────────┘                 │
                    └─────────────────────────────────────────────────┘
```

### Component Overview

| Component | Purpose | Technology |
|-----------|---------|------------|
| **Orchestrator** | Manages the reflexion loop state machine | LangGraph |
| **Generator** | Produces Python code from natural language | LLM (6 providers) |
| **Sandbox** | Executes code in isolated Docker containers | Docker SDK |
| **Critic** | Analyzes failures and suggests improvements | LLM |
| **Swarm** | Multi-agent code review (Architect, Coder, Critic, Optimizer, Security) | Async LLM calls |

### Data Flow (Peer-to-Peer)

```
┌─────────────┐     ┌─────────────┐     ┌─────────────┐
│   CLI/API   │────►│   Runtime   │────►│ Orchestrator│
│   (Input)   │     │  (Entry)    │     │ (LangGraph) │
└─────────────┘     └─────────────┘     └──────┬──────┘
                                               │
                    ┌──────────────────────────┼──────────────────────────┐
                    │                          ▼                          │
                    │  ┌─────────────┐   ┌─────────────┐   ┌───────────┐ │
                    │  │  Generator  │◄─►│   Critic    │◄─►│  Sandbox  │ │
                    │  │   (LLM)     │   │   (LLM)     │   │  (Docker) │ │
                    │  └──────┬──────┘   └─────────────┘   └───────────┘ │
                    │         │                                          │
                    │         ▼                                          │
                    │  ┌─────────────────────────────────────┐          │
                    │  │         SWARM INTELLIGENCE          │          │
                    │  │  ┌────────┐ ┌──────┐ ┌───────────┐  │          │
                    │  │  │Architect│ │Critic│ │ Security  │  │          │
                    │  │  └────────┘ └──────┘ └───────────┘  │          │
                    │  │  ┌────────┐ ┌──────────┐            │          │
                    │  │  │ Coder  │ │Optimizer │            │          │
                    │  │  └────────┘ └──────────┘            │          │
                    │  └─────────────────────────────────────┘          │
                    │                    NODE POOL                       │
                    └────────────────────────────────────────────────────┘
```

---

## ✨ Features

| Feature | Description |
|---------|-------------|
| 🔄 **Self-Correction Loop** | Automatically detects and fixes bugs through iterative refinement |
| 🐝 **Swarm Intelligence** | 5 specialist agents (Architect, Coder, Critic, Optimizer, Security) collaborate |
| 🔒 **Docker Sandbox** | Code runs in isolated containers with memory/CPU limits, no network by default |
| 🔌 **6 LLM Providers** | Groq, OpenRouter, Anthropic, Google Gemini, OpenAI, Ollama (local) |
| ⚡ **Fast Inference** | Groq's LPU delivers ~743ms average response time |
| 📊 **Structured Output** | Pydantic-validated JSON responses from LLMs |
| 🌐 **API & CLI** | FastAPI server + command-line interface |

---

## 🏆 Benchmark Results

| Metric | Value |
|--------|-------|
| **Total Tests** | 12 |
| **Passed** | 11/12 |
| **Success Rate** | **92%** |
| **Rating** | 🔥 **GOD TIER** |
| **Avg Response** | **743ms** |

### Charts

| Success by Difficulty | Response Time |
|----------------------|---------------|
| ![Success](docs/benchmarks/benchmark_charts/benchmark_success_rate.png) | ![Time](docs/benchmarks/benchmark_charts/benchmark_response_time.png) |

### vs Competitors

| Tool | Success | Speed | Self-Correct | Sandbox | Cost |
|------|---------|-------|--------------|---------|------|
| **Agent Sandbox** | **92%** ⭐ | **743ms** ⚡ | ✅ | ✅ | Free |
| GPT-4 Code Interpreter | 87% | 3.2s | ✅ | ✅ | $0.03/1K |
| Claude 3.5 Sonnet | 89% | 2.1s | ❌ | ❌ | $0.015/1K |
| Devin | 85% | 45s | ✅ | ✅ | $500/mo |
| Cursor | 78% | 2.8s | ❌ | ❌ | $20/mo |

---

## 🚀 Quick Start

### Option 1: One-Click Deploy
Click the Railway or Render button above ☝️

### Option 2: Docker
```bash
docker run -e GROQ_API_KEY=your_key ghcr.io/ixchio/agent-sandbox-runtime
```

### Option 3: Local Installation
```bash
# Clone the repository
git clone https://github.com/ixchio/agent-sandbox-runtime.git
cd agent-sandbox-runtime

# Create virtual environment
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

# Install dependencies
pip install -e .

# Configure environment
cp .env.example .env
# Edit .env and add your GROQ_API_KEY (get free key at https://console.groq.com)

# Run your first task
agent-sandbox run "Calculate fibonacci(10)"
```

### Option 4: API Server
```bash
# Start the API server
agent-sandbox serve

# POST a request
curl -X POST http://localhost:8000/execute \
  -H "Content-Type: application/json" \
  -d '{"task": "Write a function to check if a number is prime"}'
```

---

## ⚙️ Configuration

### Environment Variables

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `LLM_PROVIDER` | No | `groq` | Provider: `groq`, `openrouter`, `anthropic`, `google`, `ollama`, `openai` |
| `GROQ_API_KEY` | Yes* | - | [Get free key](https://console.groq.com) |
| `OPENROUTER_API_KEY` | Yes* | - | [Get key](https://openrouter.ai/keys) |
| `ANTHROPIC_API_KEY` | Yes* | - | [Get key](https://console.anthropic.com) |
| `GOOGLE_API_KEY` | Yes* | - | [Get key](https://aistudio.google.com/apikey) |
| `OPENAI_API_KEY` | Yes* | - | [Get key](https://platform.openai.com/api-keys) |
| `SANDBOX_TIMEOUT_SECONDS` | No | `5.0` | Max execution time per run |
| `SANDBOX_MEMORY_LIMIT_MB` | No | `256` | Container memory limit |
| `MAX_REFLEXION_ATTEMPTS` | No | `3` | Max retry attempts |
| `API_PORT` | No | `8000` | Server port |

*Only one provider API key is required

### Recommended Models by Provider

| Provider | Model | Best For |
|----------|-------|----------|
| **Groq** | `llama-3.3-70b-versatile` | Speed + Quality |
| **OpenRouter** | `qwen/qwen-2.5-coder-32b-instruct:free` | Free tier |
| **Anthropic** | `claude-3-5-sonnet-20241022` | Complex reasoning |
| **Google** | `gemini-1.5-flash` | Fast + cheap |
| **Ollama** | `qwen2.5-coder:7b` | Local/private |
| **OpenAI** | `gpt-4o-mini` | Balanced |

---

## 📂 Project Structure

```
agent-sandbox-runtime/
├── src/agent_sandbox/
│   ├── api/              # FastAPI endpoints
│   ├── cli.py            # Command-line interface
│   ├── config.py         # Settings & environment
│   ├── orchestrator/     # LangGraph workflow
│   │   ├── graph.py      # Main state machine
│   │   ├── nodes/        # Generate, Execute, Critique, Retry
│   │   └── state.py      # Workflow state model
│   ├── providers/        # LLM provider adapters
│   ├── sandbox/          # Docker execution engine
│   │   ├── manager.py    # Container lifecycle
│   │   ├── executor.py   # Code execution
│   │   └── models.py     # Request/Response types
│   ├── swarm/            # Multi-agent intelligence
│   └── runtime.py        # Main entry point
├── docs/                 # Documentation
├── tests/                # Test suite
├── Dockerfile            # Container build
├── docker-compose.yml    # Local development stack
└── pyproject.toml        # Dependencies & config
```

---

## 📚 Documentation

| Document | Description |
|----------|-------------|
| [Architecture](docs/ARCHITECTURE.md) | System design & component breakdown |
| [How It Works](docs/HOW_IT_WORKS.md) | Deep dive into the reflexion loop |
| [Capabilities](docs/CAPABILITIES.md) | What problems this solves |
| [API Reference](docs/API.md) | Endpoint documentation |
| [Contributing](CONTRIBUTING.md) | How to contribute |

---

## 🤝 Contributing

We welcome contributions! See [CONTRIBUTING.md](CONTRIBUTING.md) for:

- 🔧 Development setup
- 📝 Code style guidelines
- 🧪 Testing requirements
- 📬 Pull request process
- 💡 Feature request guidelines

### Quick Contribution Steps

```bash
# Fork & clone
git clone https://github.com/YOUR_USERNAME/agent-sandbox-runtime.git

# Create branch
git checkout -b feature/your-feature

# Install dev dependencies
pip install -e ".[dev]"

# Make changes, run tests
pytest tests/unit/ -v

# Submit PR
```

---

## 📄 License

This project is licensed under the **MIT License** - see the [LICENSE](LICENSE) file for details.

---

<div align="center">

**Built with 💜 by the open-source community**

[⭐ Star us on GitHub](https://github.com/ixchio/agent-sandbox-runtime) · [🐛 Report Bug](https://github.com/ixchio/agent-sandbox-runtime/issues) · [💡 Request Feature](https://github.com/ixchio/agent-sandbox-runtime/issues)

</div>
