# aevals

From Zero to Evals.

Go from zero evals to running evals in three commands. Scan your agent code, generate test scenarios, and score runs against deterministic constraints and LLM-judged rubrics.

## Install

```bash
uv add aevals
# or
pip install aevals
```

## Quick start

```bash
aevals scan    # detect SDK + entrypoint
aevals init    # generate aevals.yaml
aevals run     # execute scenarios
```

## What is aevals?

Most teams building agents know they should eval. They don't. The problem isn't motivation — it's that nobody knows where to start. aevals closes that gap. Point it at your codebase and it figures out the rest — which SDKs you use, where your entrypoint is, what tools your agent has.

- **Zero to eval** — Scans your codebase, detects SDKs and entrypoints, generates config. You edit scenarios, not scaffolding.
- **Deterministic constraints** — Duration, step count, tool ordering, loop detection. Zero LLM cost. Your CI gate, for free.
- **LLM-as-judge rubrics** — Natural-language assertions scored against the full trajectory — every LLM call and tool invocation, not just the final answer.
- **OTel tracing** — Auto-instruments 6 LLM SDKs via OpenTelemetry. Pipe traces to Langfuse, Phoenix, Jaeger, or any OTel backend.
- **Subprocess isolation** — Each scenario runs in its own process with independent tracing. No shared state, no crosstalk.
- **No platform** — pip install, BYO API keys, all data stays local. Same command on your laptop and in CI.

## Links

- [GitHub](https://github.com/satyaborg/aevals)
- [PyPI](https://pypi.org/project/aevals/)
- [Docs](https://aevals.dev/docs)

---

MIT License
