Metadata-Version: 2.4
Name: anneal-cli
Version: 0.3.0
Summary: Domain-agnostic autonomous optimization framework
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=2.0
Requires-Dist: scipy>=1.14
Requires-Dist: matplotlib>=3.9
Requires-Dist: openai>=2.28.0
Requires-Dist: rich>=14.3.3
Requires-Dist: pyyaml>=6.0.3
Requires-Dist: filelock>=3.25.2
Requires-Dist: httpx>=0.28.1
Requires-Dist: pydantic>=2.11
Requires-Dist: tomli-w>=1.1
Requires-Dist: tiktoken>=0.9
Provides-Extra: dashboard
Requires-Dist: aiohttp>=3.13.3; extra == "dashboard"
Provides-Extra: mcp
Requires-Dist: mcp>=1.9; extra == "mcp"
Provides-Extra: ml
Requires-Dist: scikit-learn>=1.5; extra == "ml"
Provides-Extra: dev
Requires-Dist: scikit-learn>=1.8.0; extra == "dev"
Dynamic: license-file

# anneal

Let an AI agent improve your code, prompts, and configs — overnight, unattended.

<picture>
  <img alt="anneal infographic — autonomous optimization for any measurable artifact" src="assets/infographic.png" width="100%">
</picture>

<video src="https://github.com/user-attachments/assets/06f8848d-a404-4b58-817a-8c0bf8bb71e3" width="100%" autoplay loop muted playsinline></video>

Point anneal at any text file in a git repo, tell it how to measure "better," and walk away. The agent generates hypotheses, runs experiments, keeps winners, discards losers, and compounds learnings — all while you sleep.

## How It Works

1. **Register** a target — the file to improve, how to score it, what's off-limits
2. **Run** the loop — the agent mutates → evaluates → keeps or reverts → learns → repeats
3. **Review** — every experiment is a git commit; check the history, scores, and cost

```bash
anneal register \
  --name my-target \
  --artifact path/to/file.py \
  --eval-mode deterministic \
  --run-cmd "python benchmark.py" \
  --parse-cmd "grep 'score' | awk '{print \$2}'" \
  --direction maximize \
  --scope scope.yaml

anneal run --target my-target --experiments 20
```

## Install

```bash
uv tool install anneal-cli

# Or with pip
pip install anneal-cli
```

Requires Python 3.12+.

## Examples

### [Prompt Optimization](examples/prompt-optimizer/) — stochastic eval

Improve an article summarizer prompt. The agent rewrites `system_prompt.md`, generates summaries from 5 test articles, and an LLM judge scores each against 4 binary criteria (key points captured? concise? plain language? factually accurate?).

```bash
anneal register \
  --name prompt-optimizer \
  --artifact examples/prompt-optimizer/system_prompt.md \
  --eval-mode stochastic \
  --criteria examples/prompt-optimizer/eval_criteria.toml \
  --direction maximize \
  --scope examples/prompt-optimizer/scope.yaml

anneal run --target prompt-optimizer --experiments 10
```

### [Test Coverage](examples/test-coverage/) — deterministic eval, maximize

Improve pytest test coverage of a Python module. The agent adds tests to cover untested code paths. `pytest --cov` provides the score.

```bash
anneal register \
  --name test-coverage \
  --artifact examples/test-coverage/tests/test_calculator.py \
  --eval-mode deterministic \
  --run-cmd "bash examples/test-coverage/eval.sh" \
  --parse-cmd "cat" \
  --direction maximize \
  --scope examples/test-coverage/scope.yaml

anneal run --target test-coverage --experiments 10
```

### [Code Golf](examples/code-golf/) — deterministic eval, minimize

Shrink a verbose Python file while preserving byte-identical output. In a test run: **3,592 → 228 characters (93.7% reduction)** in 7 experiments.

```bash
anneal register \
  --name code-golf \
  --artifact examples/code-golf/app.py \
  --eval-mode deterministic \
  --run-cmd "bash examples/code-golf/eval.sh" \
  --parse-cmd "cat" \
  --direction minimize \
  --scope examples/code-golf/scope.yaml

anneal run --target code-golf --experiments 10
```

### Local Artifacts (no git tracking required)

Artifact files don't need to be committed to git. If they're untracked, anneal copies them into the worktree automatically during registration. For files you don't want in version control at all, use `--in-place` to skip worktree isolation entirely:

```bash
anneal register \
  --name local-skill \
  --artifact SKILL.md \
  --eval-mode stochastic \
  --criteria eval_criteria.toml \
  --direction maximize \
  --scope scope.yaml \
  --in-place
```

## Two Eval Modes

**Deterministic** — a shell command produces a number. Run code, parse output, compare. Use for: performance benchmarks, test coverage, file size, build time.

**Stochastic** — an LLM judges N samples against K binary (YES/NO) criteria. Use for: prompt quality, documentation clarity, content optimization — anything where output varies between runs.

## Documentation

| Doc                                    | What's in it                                             |
| -------------------------------------- | -------------------------------------------------------- |
| [Overview](docs/overview.md)           | Motivation, lineage, and the core idea                   |
| [Eval Guide](docs/eval-guide.md)       | Writing good binary evaluation criteria                  |
| [Recipes](docs/recipes.md)             | Copy-paste registration commands for common targets      |
| [Use Cases](docs/use-cases.md)         | Where anneal works, where it doesn't, and why            |
| [Features](docs/features.md)           | Search strategies, statistical methods, knowledge system |
| [Architecture](docs/architecture.md)   | Module map and design principles                         |
| [System Design](docs/system-design.md) | Full technical design document                           |
| [CI Integration](docs/ci-integration.md) | GitHub Actions workflow and status JSON output          |

## Testing

```bash
uv run pytest tests/ -x -q          # 820 tests
uv run pytest tests/ --cov=anneal    # With coverage
```

## License

MIT
