Metadata-Version: 2.4
Name: zeroeval
Version: 0.8.1
Summary: ZeroEval SDK
Project-URL: Repository, https://github.com/zeroeval
Author-email: Sebastian Crossa <seb@zeroeval.com>, Jonathan Chavez <jona@zeroeval.com>
Keywords: LLM,evaluation,observability
Requires-Python: <4.0,>=3.9
Requires-Dist: opentelemetry-api>=1.26.0
Requires-Dist: opentelemetry-exporter-otlp-proto-http>=1.26.0
Requires-Dist: opentelemetry-sdk>=1.26.0
Requires-Dist: pyarrow>=21.0.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: requests>=2.32.2
Requires-Dist: rich>=13.7.1
Provides-Extra: all
Requires-Dist: google-genai>=1.21.1; extra == 'all'
Requires-Dist: langchain-core>=0.1.0; extra == 'all'
Requires-Dist: langgraph>=0.0.20; extra == 'all'
Requires-Dist: openai>=1.59.6; extra == 'all'
Requires-Dist: pyarrow>=14.0.0; extra == 'all'
Provides-Extra: datasets
Requires-Dist: pyarrow>=14.0.0; extra == 'datasets'
Provides-Extra: gemini
Requires-Dist: google-genai>=1.21.1; extra == 'gemini'
Provides-Extra: langchain
Requires-Dist: langchain-core>=0.1.0; extra == 'langchain'
Provides-Extra: langgraph
Requires-Dist: langgraph>=0.0.20; extra == 'langgraph'
Provides-Extra: openai
Requires-Dist: openai>=1.59.6; extra == 'openai'
Description-Content-Type: text/markdown

# ZeroEval SDK

The Python SDK and CLI for [ZeroEval](https://zeroeval.com) -- monitoring, prompt management, judges, and optimization for AI products.

```bash
pip install zeroeval
```

---

## Quick Start

### 1. Setup

```bash
zeroeval setup
```

This opens the ZeroEval dashboard, prompts for your project API key, and saves it along with your project context. Every command after this just works.

### 2. Trace your AI calls

```python
import zeroeval as ze
import openai

ze.init()
client = openai.OpenAI()

response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What is ZeroEval?"}],
)
```

OpenAI, Gemini, LangChain, and LangGraph calls are automatically traced. No extra code needed.

### 3. Manage prompts

```python
import zeroeval as ze

ze.init()

prompt = ze.prompt(
    name="support-triage",
    content="Classify this support ticket: {{ticket_text}}",
    variables={"ticket_text": "I can't log in to my account"},
)
```

Prompts are versioned, tagged, and linked to traces automatically.

### 4. Inspect from the CLI

```bash
zeroeval traces list --start-date 2025-01-01
zeroeval judges list
zeroeval prompts list
```

---

## Installation

```bash
pip install zeroeval            # Core SDK
pip install zeroeval[openai]    # OpenAI auto-instrumentation
pip install zeroeval[gemini]    # Google Gemini
pip install zeroeval[langchain] # LangChain
pip install zeroeval[langgraph] # LangGraph
pip install zeroeval[all]       # Everything
```

Integrations are detected and instrumented automatically.

---

## Authentication

**Interactive (recommended):**

```bash
zeroeval setup
```

Saves your API key and resolves your project automatically. You're ready to go.

**Non-interactive (CI, agents):**

```bash
zeroeval auth set --api-key sk_ze_...
```

**In code:**

```python
import zeroeval as ze
ze.init(api_key="sk_ze_...")
```

Resolution order: explicit flag > environment variable (`ZEROEVAL_API_KEY`, `ZEROEVAL_PROJECT_ID`) > CLI config (`~/.config/zeroeval/config.json`).

---

## Tracing

The SDK traces AI calls with a `Session > Trace > Span` hierarchy.

```python
import zeroeval as ze

ze.init()

# Decorator-based tracing
@ze.span(name="process-ticket")
def process_ticket(ticket_text: str):
    # your logic here
    return result

# Context manager
with ze.span(name="embedding-step"):
    embeddings = get_embeddings(text)

# Manual spans
span = ze.start_span(name="retrieval")
results = search(query)
span.end()

# Tags for filtering in the dashboard
ze.set_tag("trace", {"environment": "production", "model": "gpt-4o"})
```

See [INTEGRATIONS.md](./INTEGRATIONS.md) for automatic OpenAI, Gemini, LangChain, and LangGraph tracing.

---

## Prompts

Prompts are versioned and tagged. Use `ze.prompt()` to fetch and render them:

```python
prompt = ze.prompt(
    name="support-triage",
    content="Classify: {{ticket_text}}",
    variables={"ticket_text": ticket},
)

# Fetch a specific version or tag
prompt = ze.get_prompt("support-triage", version=3)
prompt = ze.get_prompt("support-triage", tag="production")
```

Prompts are automatically linked to traces and available for optimization.

---

## Feedback

Submit feedback on prompt completions to build training data for optimization:

```python
# Thumbs up/down
ze.send_feedback(
    prompt_slug="support-triage",
    completion_id="completion-uuid",
    thumbs_up=True,
    reason="Correct classification",
)

# Scored judge feedback
ze.send_feedback(
    prompt_slug="quality-scorer",
    completion_id="completion-uuid",
    thumbs_up=False,
    judge_id="judge-uuid",
    expected_score=3.5,
    score_direction="too_high",
    criteria_feedback={
        "accuracy": {"expected_score": 4.0, "reason": "Mostly correct"},
        "tone": {"expected_score": 1.0, "reason": "Too formal"},
    },
)
```

Feedback with `reason` and `expected_output` creates stronger training examples for prompt optimization.

---

## Datasets & Evals

```python
import zeroeval as ze

ze.init()

# Pull a dataset
dataset = ze.Dataset.pull("my-dataset")

for row in dataset:
    print(row.question, row.answer)

# Run an evaluation
@ze.task(outputs=["prediction"])
def solve(row):
    return {"prediction": llm_call(row["question"])}

@ze.evaluation(mode="row", outputs=["correct"])
def check(answer, prediction):
    return {"correct": int(answer == prediction)}

run = dataset.eval(solve, workers=8)
run = run.score([check], column_map={"answer": "answer", "prediction": "prediction"})
```

---

## CLI

The `zeroeval` CLI covers monitoring, judges, prompts, optimization, datasets, and evals. Designed for both humans and automation (`--output json`).

### Setup

```bash
zeroeval setup                              # Interactive setup
zeroeval auth set --api-key sk_ze_...       # Non-interactive
zeroeval auth show --redact                 # Show config
```

### Global flags

| Flag | Default | Description |
|------|---------|-------------|
| `--output text\|json` | `text` | Output format. `json` emits stable JSON to stdout. |
| `--project-id` | auto from setup | Override project context. |
| `--api-base-url` | `https://api.zeroeval.com` | Override API URL. |
| `--timeout` | `60.0` | Request timeout in seconds. |
| `--quiet` | off | Suppress non-essential logs. |

### Monitoring

```bash
zeroeval sessions list --start-date 2025-01-01
zeroeval sessions get <session_id>
zeroeval traces list --start-date 2025-01-01
zeroeval traces get <trace_id>
zeroeval traces spans <trace_id>
zeroeval spans list --start-date 2025-01-01
zeroeval spans get <span_id>
```

### Judges

```bash
zeroeval judges list
zeroeval judges get <judge_id>
zeroeval judges evaluations <judge_id> --limit 100
zeroeval judges criteria <judge_id>
zeroeval judges insights <judge_id>
zeroeval judges performance <judge_id>
zeroeval judges calibration <judge_id>
zeroeval judges versions <judge_id>

# Create a judge
zeroeval judges create \
  --name "answer-quality" \
  --prompt-file judge.txt \
  --evaluation-type binary \
  --sample-rate 1.0

# Submit feedback on a judge evaluation
zeroeval judges feedback create \
  --span-id <span_id> \
  --thumbs-up \
  --reason "Correct evaluation"
```

### Prompts

```bash
zeroeval prompts list
zeroeval prompts get <slug> --version 3
zeroeval prompts versions <slug>
zeroeval prompts tags <slug>

# Submit feedback on a completion
zeroeval prompts feedback create \
  --prompt-slug support-triage \
  --completion-id <id> \
  --thumbs-down \
  --reason "Wrong classification"
```

### Optimization

```bash
# Prompt optimization
zeroeval optimize prompt list <task_id>
zeroeval optimize prompt start <task_id> --optimizer-type quick_refine
zeroeval optimize prompt promote <task_id> <run_id> --yes

# Judge optimization
zeroeval optimize judge list <judge_id>
zeroeval optimize judge start <judge_id>
zeroeval optimize judge promote <judge_id> <run_id> --yes
```

### Datasets & Evals

```bash
zeroeval datasets list
zeroeval datasets get <name>
zeroeval datasets versions <name>
zeroeval datasets rows <name> --version 3 --limit 200

zeroeval evals list --status completed
zeroeval evals get <eval_id>
zeroeval evals summary <eval_id>
zeroeval evals results <eval_id>
zeroeval evals scores <eval_id> <scorer_id>
```

### Querying

List commands support `--where`, `--select`, and `--order`:

```bash
zeroeval judges list --where "name~quality" --select "id,name" --order "name:asc"
```

### Machine-readable spec

```bash
zeroeval spec cli --format json
zeroeval spec command "judges create"
```

---

## Development

```bash
uv run --group dev pytest tests/cli/ -v   # CLI tests
uv run --group dev pytest                  # Full suite
```
