Metadata-Version: 2.4
Name: revefi-agent-test
Version: 0.1.0
Summary: End-to-end agent testing — assert an agent's tools fired, via Revefi LLM observability
Author-email: Revefi <support@revefi.com>
License: MIT
Project-URL: Homepage, https://github.com/revefi/revefi-agent-test
Project-URL: Repository, https://github.com/revefi/revefi-agent-test
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.31
Requires-Dist: pyyaml>=6.0
Dynamic: license-file

# revefi-agent-test

A small Python harness that **asserts an AI agent actually called the tools it was supposed to** —
verified against Revefi's LLM observability spans, not the answer text. Works for any agent
instrumented with `revefi-llm-sdk`.

Per case it: generates a per-run `test_prompt_id` → POSTs the case `body` to the agent's `url` with a
`test-prompt-id` header → polls Revefi's test-span-data API by that id → checks the run's span for the
`required_tools` → prints the result. Exit code is non-zero if any case fails.

## Layout

```
revefi_agent_test/__init__.py   # all the logic + the CLI (run_test, AgentTestCase, RevefiConfig, main)
revefi_agent_test/__main__.py   # lets you do `python -m revefi_agent_test`
examples/raden.yaml             # a worked example config you copy and edit
pyproject.toml                  # package metadata (the `revefi-agent-test` command)
```

## Run it

```bash
pip install -e .                 # from this repo (editable; no PyPI needed)
cp examples/raden.yaml my.yaml   # copy the example, then edit the config block + cases
export REVEFI_API_KEY=…          # or put api_key in the YAML (env wins over the file)
revefi-agent-test --config my.yaml
```

## The config (one generic format)

A single YAML file: a `config` block (how to reach Revefi to read spans back) plus `cases`, each
POSTing a `body` to a `url`.

```yaml
config:
  base_url: https://your-revefi-instance.com/v1   # Revefi public API base — used to read spans back
  # api_key: "…"                                   # or set REVEFI_API_KEY (env wins); never commit secrets

cases:
  - name: web search
    url: https://your-agent.example.com/run        # the agent endpoint to drive
    body:                                           # POSTed to `url` verbatim — shape is up to the agent
      input: "Who won the 2024 IPL final? Use web search."
    required_tools:
      - web_search_tool
```

`body` is opaque — it's whatever the agent under test expects. `required_tools` is the whole point of
the test.

## As a library

```python
from revefi_agent_test import RevefiConfig, AgentTestCase, run_test

cfg = RevefiConfig(base_url="https://your-revefi-instance.com/v1", api_key="…")
cases = [
    AgentTestCase(
        name="web search",
        url="https://your-agent.example.com/run",
        body={"input": "Who won the 2024 IPL final? Use web search."},
        required_tools=["web_search_tool"],
    )
]
assert all(r.passed for r in run_test(cfg, cases))
```

## How `test_prompt_id` reaches the agent

The harness attaches a per-run `test-prompt-id` **header** to every agent call. The agent reads it
off its inbound request and forwards it to `revefi_llm_sdk.set_request_test_prompt_id(...)` so the
run's spans get tagged — in the agent's own request handler, or in whatever gateway fronts it. The
harness never touches the `body`, so adopting this needs no change to your agent's request schema.

## How tools are detected

Verification reads the run's latest `SPAN_KIND_CLIENT` span from Revefi's test-span-data API and collects
tool names from `extractedData.promptsList[*].toolCallsList[*].name` (and `completionsList[*]`),
comparing them against `required_tools`.

## CI

Run it in CI by pointing `--config` at a YAML committed to your repo and injecting the API token via
a secret (the `REVEFI_API_KEY` env var) — never commit the token.
