Metadata-Version: 2.4
Name: revefi-agent-test
Version: 0.2.0
Summary: End-to-end agent testing — assert an agent's tools fired, via Revefi LLM observability
Author-email: Revefi <support@revefi.com>
License: MIT
Project-URL: Homepage, https://github.com/revefi/revefi-agent-test
Project-URL: Repository, https://github.com/revefi/revefi-agent-test
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.31
Requires-Dist: pyyaml>=6.0
Dynamic: license-file

# revefi-agent-test

A small Python harness that **asserts an AI agent actually called the tools it was supposed to** —
verified against Revefi's LLM observability spans, not the answer text. Works for any agent
instrumented with `revefi-llm-sdk`.

Per case it: generates a per-run `test_prompt_id` → POSTs the case `body` to the agent's `url` with a
`test-prompt-id` header → polls Revefi's test-span-data API by that id → checks the run's span for the
`required_tools` → optionally scores the answer against a `baseline_answer` via Revefi's
baseline-similarity API → prints the result. Exit code is non-zero if any case fails.

## Two keys (least-privilege)

The harness uses two distinct API keys so each can be least-privilege:

- **`config.api_key`** — the **read-only** key for Revefi observability reads (test-span-data and
  baseline-similarity). `config.base_url` is the Revefi public API base.
- **`cases[].api_key`** — the key used to call **that case's** agent `url` (e.g. `/raden/ask`).

Any key may be a literal token or a **`${ENV_VAR}`** reference expanded from the environment at load
time, so CI injects secrets via env and the YAML never commits them. The read and agent keys must be
the same tenant, or the span read won't find the agent run's spans.

## Layout

```
revefi_agent_test/__init__.py   # all the logic + the CLI (run_test, AgentTestCase, RevefiConfig, main)
revefi_agent_test/__main__.py   # lets you do `python -m revefi_agent_test`
examples/raden.yaml             # a worked example config you copy and edit
pyproject.toml                  # package metadata (the `revefi-agent-test` command)
```

## Run it

```bash
pip install -e .                          # from this repo (editable; no PyPI needed)
cp examples/raden.yaml my.yaml            # copy the example, then edit the config block + cases
export REVEFI_OBS_API_KEY=…  REVEFI_AGENT_API_KEY=…   # if the YAML uses ${REVEFI_OBS_API_KEY} / ${REVEFI_AGENT_API_KEY}
revefi-agent-test --config my.yaml
```

## The config (one generic format)

A single YAML file: a `config` block (the read-only observability connection) plus `cases`, each a
self-contained agent request (`url` + `api_key` + `body`).

```yaml
config:
  base_url: https://your-revefi-instance.com/v1   # Revefi public API base — observability reads
  api_key: ${REVEFI_OBS_API_KEY}                       # READ-ONLY key; literal or a ${ENV_VAR} reference

cases:
  - name: web search
    url: https://your-agent.example.com/run        # the agent endpoint to drive
    api_key: ${REVEFI_AGENT_API_KEY}                    # key for THIS agent call; literal or ${ENV_VAR}
    body:                                           # POSTed to `url` verbatim — shape is up to the agent
      input: "Who won the 2024 IPL final? Use web search."
    required_tools:
      - web_search_tool

  - name: answer quality                            # optional semantic check via baseline-similarity
    url: https://your-agent.example.com/run
    api_key: ${REVEFI_AGENT_API_KEY}
    body:
      input: "What is the configuration of warehouse REVEFI_DEV_WH?"
    baseline_answer: "REVEFI_DEV_WH is X-SMALL with auto-suspend 60s."
    min_similarity: 0.8                             # fail if cosine(answer, baseline) < this (default 0.8)
```

`body` is opaque — it's whatever the agent under test expects. A case can assert `required_tools`
(verified via spans) and/or set `baseline_answer` + `min_similarity` (verified via baseline-similarity).
The similarity candidate is the **answer text parsed from the run's span** (the LLM completion
content — cleaner than the agent's JSON/markup HTTP body); if the span can't be read (not ingested),
it falls back to the raw agent response.

## As a library

```python
from revefi_agent_test import RevefiConfig, AgentTestCase, run_test

cfg = RevefiConfig(base_url="https://your-revefi-instance.com/v1", api_key="<read-only obs key>")
cases = [
    AgentTestCase(
        name="web search",
        url="https://your-agent.example.com/run",
        api_key="<agent key>",
        body={"input": "Who won the 2024 IPL final? Use web search."},
        required_tools=["web_search_tool"],
    )
]
assert all(r.passed for r in run_test(cfg, cases))
```

## How `test_prompt_id` reaches the agent

The harness attaches a per-run `test-prompt-id` **header** to every agent call. The agent reads it
off its inbound request and forwards it to `revefi_llm_sdk.set_request_test_prompt_id(...)` so the
run's spans get tagged — in the agent's own request handler, or in whatever gateway fronts it. The
harness never touches the `body`, so adopting this needs no change to your agent's request schema.

## How tools are detected

Verification reads the run's latest `SPAN_KIND_CLIENT` span from Revefi's test-span-data API and collects
tool names from `extractedData.promptsList[*].toolCallsList[*].name` (and `completionsList[*]`),
comparing them against `required_tools`.

## CI

Run it in CI by pointing `--config` at a YAML committed to your repo whose `api_key` values are
`${ENV_VAR}` references, and inject the matching secrets via env — never commit the tokens. For
example, with `config.api_key: ${PUBLIC_API_TOKEN_RO}` and `cases[].api_key: ${PUBLIC_API_TOKEN}`:

```yaml
env:
  PUBLIC_API_TOKEN_RO: ${{ secrets.PUBLIC_API_TOKEN_RO }}   # read-only observability key
  PUBLIC_API_TOKEN:    ${{ secrets.PUBLIC_API_TOKEN }}      # agent-call key (same tenant)
```
