Metadata-Version: 2.4
Name: proofagent-sdk
Version: 0.1.5
Summary: Official Python SDK for ProofAgent™
Author: ProofAgent
License: MIT License
        
        Copyright (c) 2026 ProofAgent
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://www.proofagent.ai
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: httpx>=0.27.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0.0; extra == "dev"
Requires-Dist: ruff>=0.6.0; extra == "dev"
Requires-Dist: mypy>=1.10.0; extra == "dev"
Requires-Dist: build>=1.2.0; extra == "dev"
Requires-Dist: mkdocs>=1.6.0; extra == "dev"
Requires-Dist: mkdocs-material>=9.5.0; extra == "dev"
Requires-Dist: ipykernel>=6.29.0; extra == "dev"
Dynamic: license-file

<p align="center">
  <img src="assets/proofagent-logo.svg" alt="ProofAgent™" width="380">
</p>

<p align="center">
  <a href="https://pypi.org/project/proofagent-sdk/">PyPI</a> •
  <a href="https://github.com/ProofAgent-ai/proofagent-sdk">GitHub</a> •
  <a href="https://www.proofagent.ai/">Website</a> •
  <a href="https://www.proofagent.ai/docs">Documentation</a>
</p>

# ProofAgent™ Python SDK

Official Python SDK for [ProofAgent™](https://www.proofagent.ai/), the AI agent evaluation and certification platform.

This SDK is the **supported Python client** for running evaluations, retrieving reports, and integrating ProofAgent™ into production workflows.

## Evaluation modes

ProofAgent supports two evaluation tiers. **Judge-Led Evaluation** is the default mental model for the SDK.

| Tier | Name | What it does | Best for |
|------|------|----------------|----------|
| **1** | **Judge-Led Evaluation** | The AI Judge initiates and drives the conversation; your agent answers turn by turn (simulated user, multi-turn scenarios). | Pre-production validation and certification |
| **2** | **Log-Based Evaluation** | You submit **historical** customer↔agent conversation logs in one request for scoring. | Post-production validation, regression testing, and back-testing |

**In one line:** Judge-Led **simulates** interactions; Log-Based **evaluates** interactions you already recorded.

## Platform status (beta)

ProofAgent™ is in **beta**. New accounts are on the **free tier** for now. **Judge evaluations use models from your own LLM provider**—pass `llm_api_key`, `llm_provider`, and `llm_model` in `evaluate`, `evaluate_logs`, or `start_run` so the ProofAgent AI Judge runs on your chosen account; **model usage is charged by your provider**, not bundled into the free platform tier. APIs, limits, and pricing may change as we move toward general availability.

## Links

- **Website:** https://www.proofagent.ai
- **Documentation:** https://www.proofagent.ai/docs
- **GitHub:** https://github.com/ProofAgent-ai/proofagent-sdk
- **PyPI:** https://pypi.org/project/proofagent-sdk/

## Installation

**Package naming**

| | |
|---|---|
| **PyPI distribution** | `proofagent-sdk` |
| **Import package** | `proofagent` |

### From PyPI (recommended)

```bash
pip install proofagent-sdk
```

### From GitHub (latest `main` without cloning)

```bash
pip install "git+https://github.com/ProofAgent-ai/proofagent-sdk.git"
```

### From a local clone (editable)

```bash
git clone https://github.com/ProofAgent-ai/proofagent-sdk.git
cd proofagent-sdk
pip install -e .
```

Development install with extras (lint/tests/docs):

```bash
pip install -e ".[dev]"
```

After any install:

```python
from proofagent import ProofAgent, TestedAgent  # recommended
from proofagent import ProofAgentClient  # low-level REST client
```

## ProofAgent AI Agent Judge (domain scoring)

The **ProofAgent AI Agent Judge** is more than a generic LLM chat score. It combines:

- **Domain scoring techniques** — rubrics and pipelines aligned to your project (tier, mode, configured metrics).
- **Domain vertical knowledge** — evaluation context grounded in your project’s **domain** (e.g. customer support, finance, cybersecurity) so judge questions, traps, and scoring stay **relevant** to real workflows.
- **Structured Tier 1 metrics** — every completed run can surface scores across dimensions such as:

| Metric key | What it captures |
|------------|------------------|
| `task_success` | Completion of the intended objective |
| `relevance` | Response appropriateness to the user and context |
| `hallucination_factuality` | Accuracy and groundedness of claims |
| `safety` | Harmful or unsafe content |
| `policy_compliance` | Adherence to business / policy rules |
| `tone_and_empathy` | Communication quality and empathy |
| `reasoning_quality` | Logic and coherence |
| `drift_memory_stability` | Consistency and context retention across turns |
| `manipulation_resistance` | Resistance to prompt injection and coercion |
| `coordination_quality` | Multi-agent coordination (when applicable) |
| `tool_picking_quality` | Appropriate tool selection (when tools are in scope) |

Exact keys and aliases in API responses may vary slightly by API version; see your run report’s `summary_scores` / `metric_evaluations`.

ProofAgent’s **proprietary domain scoring layer** sits on top of whichever LLM provider you use for BYO: the Judge still applies domain rubrics and metrics regardless of provider support status below.

## Supported BYO LLMs for the Judge

When you pass `llm_api_key`, `llm_provider`, and `llm_model` into `evaluate` / `evaluate_logs` / `start_run`, the Judge uses that model for planning, conducting, and scoring for that run. **During beta, expect to supply BYO credentials**; model usage is billed by **your** provider. Fully managed Judge hosting may be limited while we are in beta.

| LLM / provider | BYO in this SDK | Example models | Notes |
|------------------|-----------------|----------------|-------|
| **OpenAI** | **Supported** | `gpt-4o-mini`, `gpt-4o`, `gpt-4-turbo`, `gpt-3.5-turbo` | Use `llm_provider="openai"` and an [OpenAI API key](https://platform.openai.com/api-keys). |
| Anthropic (Claude) | Coming soon | — | Roadmap |
| Google (Gemini) | Coming soon | — | Roadmap |
| Mistral | Coming soon | — | Roadmap |
| Azure OpenAI | Coming soon | — | Roadmap |

**Today, only OpenAI is supported for BYO** through the public API/SDK; additional providers are on the roadmap.

## Quick Start — Judge-Led Evaluation (default)

**Mental model:** **your tested agent** (the product you ship) vs **the AI Judge** (ProofAgent’s evaluation system).

1. Describe the tested agent as **JSON** (`role`, `description`, `tools`).
2. Wire a small **handler** `def your_agent_handler(message: str) -> str` (or an **HTTP endpoint** instead).
3. Run **`ProofAgent.evaluate_sync`** (or `evaluate` in async code).

Use a **Judge-Led** project API key.

```bash
export PROOFAGENT_API_KEY="apk_live_..."
export OPENAI_API_KEY="sk-..."   # optional BYO — reasoning/Judge LLM on your account
```

With **`verbose=True`**, you will see lines like:

```text
[ProofAgent] Starting judge-led evaluation...
[Turn 1] AI Judge: ...
[Turn 1] Your Agent: ...
```

```python
from proofagent import ProofAgent, TestedAgent

tested_agent_config = {
    "role": "customer_support",
    "description": "Helpful, policy-grounded support assistant",
    "tools": [
        {"name": "policy_lookup", "description": "Retrieve policy clauses"},
        {"name": "ticket_status", "description": "Ticket and escalation status"},
    ],
}

def your_agent_handler(message: str) -> str:
    return "I can help with that. Let me check the policy and status."

your_agent = TestedAgent.from_json(tested_agent_config, handler=your_agent_handler)

pa = ProofAgent.from_env(reasoning_provider="openai", reasoning_model="gpt-4o-mini")

result = pa.evaluate_sync(your_agent=your_agent, turns=3, verbose=True)
print(result.label, result.score)
```

**Endpoint instead of a function:** `TestedAgent.from_json(tested_agent_config, endpoint="https://api.myagent.com/chat")` — POST JSON `{"message": "<judge question>"}`; the SDK reads `reply`, `response`, `text`, `answer`, or `agent_answer` from the JSON body.

`evaluate_sync` / `evaluate` wrap `start_run` → `poll_until_ready` → turns → `finalize` → `get_report`. **`EvaluationResult`** exposes `run_id`, `report`, and shortcuts **`score`** / **`label`**.

Reports also appear in the app: **[https://www.proofagent.ai/dashboard](https://www.proofagent.ai/dashboard)**.

---

## Log-Based Evaluation

**Log-Based Evaluation** scores **historical** transcripts. Use a **Log-Based** project API key. Same JSON config for the tested agent; **no handler** (metadata only).

```python
from proofagent import ProofAgent, TestedAgent

tested_agent_config = {
    "role": "billing_support",
    "description": "Billing assistant",
    "tools": [{"name": "invoice_lookup", "description": "Find invoices"}],
}

logs = [
    {"turn_index": 1, "user_message": "I was charged twice", "agent_answer": "Let me verify."},
]

your_agent = TestedAgent.from_json(tested_agent_config)
pa = ProofAgent.from_env(reasoning_provider="openai", reasoning_model="gpt-4o-mini")
result = pa.evaluate_logs_sync(logs, your_agent, verbose=True)
print(result.label, result.score)
```

`evaluate_logs` / `evaluate_logs_sync` call **`assert_project_supports_logs`** first. See **`LOG_BASED_PROJECT_MODES`** if your key is the wrong project type.

---

### CLI

```bash
proofagent init
```

Creates a starter `proofagent.yaml`. The Python client reads **`PROOFAGENT_API_KEY`** from the environment (the YAML file is onboarding only unless you load it yourself).

```bash
proofagent init --output custom-proofagent.yaml
```

#### Example report shape (`GET /api/v1/runs/:id/report`)

Exact fields depend on backend version and domain; typical **`data`** looks like:

```json
{
  "result": {
    "final_score": 8.4,
    "certification_label": "CERTIFIED",
    "summary_scores": {
      "task_success": 8.5,
      "safety": 9.0,
      "policy_compliance": 8.0
    },
    "flags": [],
    "text_summary": "Short narrative from the AI Judge…"
  },
  "transcript": [
    {
      "turn": 1,
      "judge_question": "…",
      "agent_answer": "…"
    }
  ],
  "metadata": {
    "total_turns": 3,
    "evaluated_at": "2026-03-24T12:00:00Z"
  }
}
```

**View reports in the product:** [https://www.proofagent.ai/dashboard](https://www.proofagent.ai/dashboard)

Example report:

![Example evaluation report in the ProofAgent dashboard](assets/report.png)

Runnable copies: [`examples/judge_led_quickstart.py`](examples/judge_led_quickstart.py), [`examples/log_based_evaluation.py`](examples/log_based_evaluation.py). Minimal notebooks are under [`notebooks/`](notebooks/) (see [`docs/examples.md`](docs/examples.md)).

The client is **asynchronous** — use `async` / `await` (or `asyncio.run()` as above).

## Why ProofAgent™?

ProofAgent™ is built to help teams evaluate AI agents before deployment by supporting:

- **Correctness** and response quality checks
- **Refusal** and safety validation
- **Tool usage** and execution verification
- **Multi-turn** evaluation flows
- **Production-oriented** reporting and integration

## Official SDK

This repository publishes the official **`proofagent-sdk`** package on PyPI.

Use this SDK when you want a maintained Python client aligned with the ProofAgent™ platform and API.

## Documentation and examples

| Resource | Description |
|----------|-------------|
| [Documentation portal](https://www.proofagent.ai/docs) | Main product and SDK documentation |
| [docs/python-sdk-guide.md](docs/python-sdk-guide.md) | Python SDK guide |
| [docs/quickstart.md](docs/quickstart.md) | Quickstart snippets |
| [examples/](examples/) | Runnable examples |

Build docs locally:

```bash
make docs-serve
```

## Configuration

| Variable | Description |
|----------|-------------|
| `PROOFAGENT_API_KEY` | API key used by `ProofAgentClient.from_env()` |
| `PROOFAGENT_BASE_URL` | API base URL. Defaults to `https://api.proofagent.ai` |

For advanced configuration such as retries and timeouts, see `ProofAgentConfig`.

## Package layout

**`src/proofagent/`** — main SDK package

| Module | Role |
|--------|------|
| `proof_agent.py` | `ProofAgent` facade (`evaluate_sync`, reasoning defaults) |
| `tested_agent.py` | `TestedAgent` (JSON + handler or endpoint) |
| `client.py` | `ProofAgentClient` (`evaluate`, `evaluate_logs`, REST) |
| `evaluation.py` | `EvaluationResult` (`score`, `label`) and helpers |
| `project_support.py` | Log-Based project checks (`assert_project_supports_logs`) |
| `config.py` | Configuration handling |
| `exceptions.py` | SDK exceptions |
| `types.py` | Shared SDK types |
| `cli.py` | CLI entrypoint for the `proofagent` command |

**Runtime requirements:** Python 3.10+, **httpx** for async HTTP.

## License

See the [LICENSE](LICENSE) file for details.

## Support

- **Website:** https://www.proofagent.ai
- **Documentation:** https://www.proofagent.ai/docs
- **GitHub Issues:** https://github.com/ProofAgent-ai/proofagent-sdk/issues
