Metadata-Version: 2.4
Name: agent-playground
Version: 0.1.0
Summary: Local-first playground for experimenting with simple AI agents and sandboxed Python execution.
Author: Agent Playground contributors
License-Expression: MIT
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Education
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Education
Classifier: Topic :: Software Development :: Debuggers
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == "dev"
Requires-Dist: build>=1; extra == "dev"
Requires-Dist: twine>=5; extra == "dev"
Dynamic: license-file

# Agent Playground

[中文文档](README.zh-CN.md)

Agent Playground is a local-first, lightweight, educational framework for
understanding and debugging simple AI agents.

It combines four focused directions:

- specific multi-agent collaboration patterns
- readable agent framework code
- structured trace and timeline debugging
- sandboxed Python execution for model-generated code

The public concepts are intentionally small: `Agent`, `ModelAdapter`, `Tool`,
`Sandbox`, `Trace`, `Run`, and `ReviewTeam`.

## What This Project Is

Agent Playground is built for two vertical slices:

- **Single Agent + Python Sandbox + Trace**: ask an agent to produce Python code,
  execute it in an isolated subprocess sandbox, and inspect the trace.
- **Generate -> Review -> Refine**: run a fixed three-agent collaboration pattern
  with a shared trace and a structured reviewer decision.

It is not a RAG system, long-term memory layer, arbitrary agent graph runtime,
workflow DSL, plugin marketplace, hosted service, or web dashboard.

## Architecture

```mermaid
flowchart TD
    U["User / examples / CLI"] --> A["Agent.run(task)"]

    A --> MA["ModelAdapter"]
    FM["FakeModelAdapter"] -. tests .-> MA
    OA["OpenAI-compatible adapter"] -. real model .-> MA

    A --> PT["PythonTool"]
    PT --> SB["Sandbox"]
    SB --> SS["SubprocessSandbox"]
    SB --> DS["DockerSandbox"]
    PT --> OUT["ToolResult<br/>stdout / stderr / exit code / artifacts"]

    A --> TR["TraceRecorder"]
    MA --> TR
    OUT --> TR
    TR --> TJ["Trace JSON"]
    TR --> TV["Timeline / events view"]

    RT["ReviewTeam<br/>Generate -> Review -> Refine"] --> G["generator Agent"]
    RT --> R["reviewer Agent<br/>JSON decision"]
    RT --> RF["refiner Agent"]
    G --> TR
    R --> TR
    RF --> TR
```

`ReviewTeam` is a fixed collaboration pattern over normal `Agent` instances.
It uses one shared trace so model calls, reviewer decisions, tool execution,
and sandbox policy metadata can be inspected together.

## Installation

The package name is `agent-playground`; the Python import name is
`agent_playground`.

For development inside this repository, use either `uv` or `pip`.

With `uv`:

```bash
uv sync --extra dev
uv run pytest -q
uv run python examples/01_single_agent_python.py
uv run python examples/03_generate_review_refine.py
uv run python examples/06_docker_sandbox.py
uv run python examples/07_trace_readability.py
```

With `pip`:

```powershell
python -m venv .venv
.\.venv\Scripts\Activate.ps1
python -m pip install -e ".[dev]"
python -m pytest -q
python examples/01_single_agent_python.py
python examples/03_generate_review_refine.py
python examples/06_docker_sandbox.py
python examples/07_trace_readability.py
```

Before a PyPI release exists, another local project can install this package
from the checkout path:

```bash
# uv
uv add --editable G:\Agent-Playground

# pip
python -m pip install -e G:\Agent-Playground
```

After a PyPI release exists:

```bash
# uv
uv add agent-playground

# pip
python -m pip install agent-playground
```

## Minimal Single Agent

```python
from agent_playground import Agent, FakeModelAdapter, PythonTool, SubprocessSandbox

model = FakeModelAdapter("""```python
print("hello from sandbox")
```""")

agent = Agent(
    name="coder",
    model=model,
    tools=[PythonTool(SubprocessSandbox(timeout_seconds=3))],
)

run = agent.run("Write and run a tiny Python program.")

print(run.output)
run.trace.print_timeline()
run.trace.export_json("trace.json")
```

## Generate-Review-Refine

```python
from agent_playground import Agent, FakeModelAdapter, ReviewTeam

generator = Agent("generator", FakeModelAdapter("initial answer"))
reviewer = Agent(
    "reviewer",
    FakeModelAdapter(
        '{"status": "pass", "issues": [], "suggestions": [], "reason": "ok"}'
    ),
)
refiner = Agent("refiner", FakeModelAdapter("refined answer"))

team = ReviewTeam(generator=generator, reviewer=reviewer, refiner=refiner)
run = team.run("Create a small solution and review it.")

print(run.status)
print(run.output)
run.trace.print_timeline()
```

## Provider Config

Real model access is configured with a local provider JSON file. The project does
not automatically read `.env`; use the `api_key` field in the provider config.

Create a local config from the template:

```powershell
Copy-Item config/providers.example.json config/providers.local.json
notepad config/providers.local.json
```

Expected structure:

```json
{
  "providers": {
    "bailian": {
      "name": "bailian",
      "api_type": "openai",
      "base_url": "https://dashscope.aliyuncs.com/compatible-mode/v1",
      "api_key": "replace-with-your-api-key",
      "models": {
        "default": "qwen3.6-plus",
        "single_agent": "qwen3.6-plus",
        "generator": "qwen3.6-plus",
        "reviewer": "qwen3.6-plus",
        "refiner": "qwen3.6-plus"
      }
    }
  }
}
```

Run Qwen/Bailian examples:

```bash
# uv
uv run python examples/04_qwen_single_agent.py
uv run python examples/05_qwen_generate_review_refine.py

# pip
python examples/04_qwen_single_agent.py
python examples/05_qwen_generate_review_refine.py
```

Or call a configured model from the CLI:

```bash
agent-playground run --provider bailian --model-alias single_agent "Write Python code that prints 1 + 1."
agent-playground run --provider bailian --model-alias single_agent --sandbox docker "Write Python code that prints 1 + 1."
```

## Trace

Every run produces a structured `Trace`. It records model calls, tool calls,
stdout, stderr, exit codes, errors, artifacts, timeline events, team rounds,
agent roles, reviewer decisions, and non-secret sandbox policy metadata.

```python
run.trace.print_timeline()
run.trace.export_json("trace.json")
```

Static HTML viewer:

```python
from agent_playground import export_trace_html

export_trace_html(run.trace, "trace.html")
```

CLI trace views:

```bash
agent-playground trace traces/example.json
agent-playground trace traces/example.json --timeline
agent-playground trace traces/example.json --events
agent-playground trace traces/example.json --html trace.html
```

For a trace-focused example:

```bash
python examples/07_trace_readability.py
```

The HTML viewer is a single local file with inline CSS. It does not start a
server and does not require a JavaScript framework.

## Sandbox

`SubprocessSandbox` runs generated Python in a separate process with an isolated
workspace, timeout, minimal environment, stdout/stderr capture, exit code
capture, and common write guards. Its policy metadata is recorded into trace
tool calls so the execution boundary is visible during debugging.

It is isolated for local experimentation, not a production-grade security
sandbox.

`DockerSandbox` is available for local Docker-based experiments when Docker is
installed. It runs Python in `python:3.11-slim` by default, disables container
network access, applies memory/CPU/pids limits, and uses a read-only container
root filesystem with `/workspace` mounted for the task.

```python
from agent_playground import DockerSandbox, PythonTool

tool = PythonTool(DockerSandbox(timeout_seconds=5))
```

## Build Checks

```bash
# uv
uv run python -m build
uv run python -m twine check dist/*

# pip
python -m build
python -m twine check dist/*
```
