Metadata-Version: 2.3
Name: evaluateur
Version: 0.1.0
Summary: synthetic evals for agents
Author: Sasha Aptlin
Author-email: Sasha Aptlin <sasha@aptford.com>
License: MIT
Requires-Dist: dspy>=3.0.4
Requires-Dist: instructor>=1.13.0
Requires-Dist: pydantic>=2.12.5
Requires-Dist: python-dotenv>=1.2.1
Requires-Python: >=3.10, <4.0
Project-URL: documentation, https://github.com/aptford/evaluateur/blob/main/README.md
Project-URL: homepage, https://github.com/aptford/evaluateur
Project-URL: repository, https://github.com/aptford/evaluateur
Description-Content-Type: text/markdown

## Evaluateur

Synthetic evaluation helper for LLM applications, built around the
**dimensions → tuples → queries** flow described in [Hamel Husain's FAQ](https://hamel.dev/blog/posts/evals-faq/what-is-the-best-approach-for-generating-synthetic-data.html).

### Installation

The project is packaged as a normal Python library. With `uv`:

```bash
uv add evaluateur
```

### Basic usage

Define a Pydantic model that represents the dimensions of your evaluation
space, then use the `Evaluator` to generate options and queries:

```python
from pydantic import BaseModel, Field

from evaluateur import Evaluator, QueryMode, TupleStrategy


class Query(BaseModel):
    payer: str = Field(..., description="insurance payer, like Cigna")
    age: str = Field(..., description="patient age category, like 'adult' or 'pediatric'")
    complexity: str = Field(
        ...,
        description="complexity of the query to account for the edge cases, like 'off-label', 'comorbidities', etc",
    )
    geography: str = Field(..., description="geography indicator, like a zip code, specific state or county")


evaluator = Evaluator(Query, context="Healthcare prior authorization")

# Step 1: generate options for each dimension using Instructor
options = evaluator.generate_options(
    instructions="Focus on common US payers and edge-case clinical scenarios.",
)

# Step 2: turn options into tuples and natural language queries
output = evaluator.generate_queries(
    options=options,
    mode=QueryMode.HYBRID,
    tuple_strategy=TupleStrategy.CROSS_PRODUCT,
    tuple_count=50,
)

for q in output.queries:
    print(q.source_tuple.values, "->", q.query)
```

The evaluator uses environment variables (for example `OPENAI_API_KEY`)
and supports any provider that `instructor` supports. You can customise the
provider and model via the `LLMClient` helper if needed.

If your input model already uses iterator fields (for example
`payer: list[str] = ["Cigna", "Aetna"]`), those lists are treated as fixed
options and are not modified by `generate_options()`. Scalar fields of any
basic type (`str`, `int`, `float`, and so on) are turned into lists of
options automatically.
