Metadata-Version: 2.4
Name: evolution-mcp
Version: 0.1.1
Summary: Modal-backed MCP server for parallel evaluation of agent-generated code variants
Author: Deepesh Bansal
License-Expression: MIT
Project-URL: Homepage, https://github.com/deepeshbansal/evolution-mcp
Project-URL: Repository, https://github.com/deepeshbansal/evolution-mcp
Project-URL: Issues, https://github.com/deepeshbansal/evolution-mcp/issues
Keywords: mcp,modal,codex,agents,evaluation,sandbox
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mcp>=1.13.0
Requires-Dist: modal>=1.1.0
Requires-Dist: redis>=5.0.0
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == "dev"
Requires-Dist: ruff>=0.8; extra == "dev"
Dynamic: license-file

# Evolution MCP

Modal-backed MCP server for evaluating multiple agent-generated code variants in
parallel.

Codex runs locally and generates candidate patches or full-file writes. Modal
only evaluates those variants:

1. Sync the dirty local repo into a Modal image with `Image.add_local_dir`.
2. Start a setup sandbox and run setup commands once.
3. Snapshot the prepared filesystem with `Sandbox.snapshot_filesystem`.
4. Start `run_experiment`; it returns immediately with `status: running`.
5. Enqueue a durable Redis Stream job.
6. A worker process claims the job and forks one Modal sandbox per variant.
7. Apply each patch or full-file write inside its sandbox.
8. Run the same eval command in each sandbox.
9. Persist result metadata and artifacts to Redis.
10. Poll `get_experiment` until completed.
11. Optionally call `apply_winner` to apply the winning diff locally.

## Install

```bash
uv sync
```

For a published package, the intended install shape is:

```bash
uvx --from evolution-mcp evolution-mcp-doctor
```

## Configure Modal

```bash
uv run modal setup
```

The controller uses your local Modal credentials to create images, sandboxes,
and snapshots. Eval sandboxes do not need OpenAI credentials. Only pass
`secret_names` when the repo's own setup or tests need secrets.

## Configure Redis

V2 uses Redis for durable workspace records, experiment records, statuses, and
small artifacts.

```bash
export REDIS_URL=redis://localhost:6379/0
```

Redis must be running before normal use:

```bash
brew install redis
brew services start redis
```

For dev-only local JSON persistence:

```bash
export EVOLUTION_MCP_STORAGE=local
```

## Doctor

```bash
uv run evolution-mcp-doctor
```

This checks storage, checks the active Modal profile, and prints a Codex MCP
config snippet.

## Run The MCP Server

```bash
uvx --from evolution-mcp evolution-mcp
```

## Run A Worker

Run at least one worker process alongside the MCP server:

```bash
uvx --from evolution-mcp evolution-mcp-worker
```

Workers consume Redis Stream jobs. They acknowledge a job only after writing the
completed or failed experiment record, so unacked jobs can be reclaimed by
another worker after `--reclaim-after-ms`.

## Add To Codex

Example MCP config:

```json
{
  "mcpServers": {
    "evolution-mcp": {
      "command": "uvx",
      "args": ["--from", "evolution-mcp", "evolution-mcp"],
      "env": {
        "REDIS_URL": "redis://localhost:6379/0"
      }
    }
  }
}
```

## Tool Flow

Prepare once:

```json
{
  "repo_path": "/Users/me/project",
  "setup_commands": ["pytest --version"]
}
```

Start async experiment:

```json
{
  "workspace_id": "ws_...",
  "variants": [
    {
      "name": "minimal-fix",
      "patch": "diff --git a/app.py b/app.py\n..."
    },
    {
      "name": "boundary-fix",
      "files": [
        {"path": "app.py", "content": "...full file content..."}
      ]
    }
  ],
  "eval_command": "pytest -q",
  "parallelism": 2
}
```

`run_experiment` returns immediately:

```json
{
  "experiment_id": "exp_...",
  "status": "running"
}
```

Poll:

```json
{
  "experiment_id": "exp_..."
}
```

Completed result:

```json
{
  "status": "completed",
  "winner": "minimal-fix",
  "results": []
}
```

## Example Agent Prompt

```text
Generate 2-3 meaningfully different candidate fixes as unified diffs or
full-file variants. Call prepare_workspace once, then call run_experiment with
those variants and the exact eval command. Poll get_experiment until the status
is completed, failed, or no_passing_variant. Do not call apply_winner unless the
user explicitly asks to apply the selected diff.
```

## Smoke Run

```bash
EVOLUTION_MCP_STORAGE=local uv run evolution-modal-smoke /path/to/repo "pytest -q"
```

The smoke command runs a local worker loop for the queued smoke job. In normal
usage, keep `evolution-mcp-worker` running separately.

## Tests

```bash
PYTHONPATH=src python3 -m unittest discover -s tests
uv run --extra dev ruff check .
```

## Known Limitations

- Full logs/diffs are stored in Redis for V2. Large artifacts should move to
  blob storage or Modal Volume later.
- Modal SDK `1.4.3` does not expose snapshot image deletion; cleanup removes
  local/Redis records and tracks the snapshot ID.
- `apply_winner` applies a stored diff to the local repo with `git apply`; if
  the repo changed since the experiment, the patch can fail.
