Metadata-Version: 2.4
Name: correctover-patronus
Version: 0.1.0
Summary: Correctover 6-Dimensional Verification adapter for Patronus AI evaluation framework
Project-URL: Homepage, https://github.com/Correctover/correctover-patronus
Project-URL: Standards, https://github.com/Correctover/correctover-crewai/blob/main/STANDARDS.md
Project-URL: Conformance, https://api.babyblueviper.com/conformance
Author-email: Guigui Wang <wanggui.gui@neuralbridge.cn>
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: agent-evaluation,ai-safety,correctover,evaluation,llm,patronus,verification
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.10
Requires-Dist: correctover-crewai>=0.1.0
Requires-Dist: patronus>=0.1.10
Provides-Extra: dev
Requires-Dist: pytest-asyncio>=0.21; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Description-Content-Type: text/markdown

# Correctover Patronus Adapter

> **6-Dimensional Verification for Patronus AI Evaluation Framework**

Bridges [Correctover](https://github.com/Correctover/correctover-crewai)'s deterministic verification engine into the [Patronus AI](https://patronus.ai) evaluation framework. Run Correctover's 87 rules across 6 dimensions on any LLM output, with recomputable proof hashes.

**Failover ≠ Correctover.**™

---

## What This Does

Patronus AI evaluates LLM outputs. Correctover verifies them. This adapter lets you run Correctover's verification as Patronus evaluators — so you can combine Correctover's deterministic checks with Patronus's experiment tracking, tracing, and benchmarking.

| Patronus Evaluators | Correctover Verification |
|---|---|
| Hallucination detection (Lynx) | Structure + Schema validation |
| Toxicity / PII detection | Integrity checks (forbidden patterns) |
| Context relevance (Lynx) | Identity verification (semantic relevance) |
| Custom judges (Glider) | Full 6-dim deterministic verification |
| **This adapter** | **Adds: recomputable proof hashes, drift detection, failover signals** |

## Quick Start

### Installation

```bash
pip install correctover-patronus
```

### Full 6-Dimension Verification (Recommended)

```python
from correctover_patronus import CorrectoverEvaluator, CorrectoverConfig

# Configure verification rules
config = CorrectoverConfig(
    structure_rules={"format": "json", "required_keys": ["answer", "confidence"]},
    schema_rules={
        "require_json": True,
        "fields": {
            "answer": {"type": "string", "required": True},
            "confidence": {"type": "number", "required": True},
        },
    },
    integrity_rules={
        "forbidden_patterns": [r"Traceback", r"Error:\s", r"Exception:"]
    },
    latency_rules={"max_ms": 5000},
    cost_rules={"max_tokens": 2000},
    identity_rules={"min_similarity": 0.3},
)

evaluator = CorrectoverEvaluator(config=config)

# Evaluate an LLM output
result = evaluator.evaluate(
    task_input="What is 2+2?",
    task_output='{"answer": "4", "confidence": 0.95}',
)

print(f"Verdict: {result.text_output}")     # "pass"
print(f"Score: {result.score:.2f}")          # 1.00
print(f"Passed: {result.pass_}")             # True
print(f"Proof: {result.metadata['correctover_proof_hash'][:16]}...")
```

### Individual Dimensions

Each dimension is also available as a standalone Patronus evaluator:

```python
from correctover_patronus import (
    correctover_structure,
    correctover_schema,
    correctover_identity,
    correctover_integrity,
)

# Check structure only
result = correctover_structure(
    evaluated_model_input="What is 2+2?",
    evaluated_model_output="The answer is 4.",
)

# Check integrity (forbidden patterns)
result = correctover_integrity(
    evaluated_model_input="Process data",
    evaluated_model_output="Traceback (most recent call last): Error",
)
print(f"Passed: {result.pass_}")  # False
```

### Using in Patronus Experiments

```python
from patronus import init
from patronus.experiments import run_experiment
from correctover_patronus import CorrectoverEvaluator, CorrectoverConfig

init(api_key="your-patronus-api-key")

config = CorrectoverConfig(
    structure_rules={"format": "json"},
    integrity_rules={"forbidden_patterns": ["Traceback", "Error:"]},
)
correctover_eval = CorrectoverEvaluator(config=config)

def my_task(row, **kwargs):
    # Your LLM call here
    return call_my_llm(row["task_input"])

experiment = run_experiment(
    project_name="My Agent Evaluation",
    dataset=[
        {"task_input": "What is 2+2?", "gold_answer": "4"},
        {"task_input": "Capital of France?", "gold_answer": "Paris"},
    ],
    task=my_task,
    evaluators=[correctover_eval],
)
experiment.to_csv("./results.csv")
```

### The `correctover_full` Shorthand

For quick integration without configuration:

```python
from correctover_patronus import correctover_full

result = correctover_full(
    evaluated_model_input="Tell me about AI",
    evaluated_model_output="AI is a field of computer science focused on building intelligent systems.",
)
print(f"Verdict: {result.text_output}, Score: {result.score:.2f}")
```

## The 6 Dimensions

| # | Dimension | What It Checks | Failure Signal |
|---|-----------|---------------|----------------|
| D1 | **Structure** | JSON parseability, required keys, minimum length | Output is malformed or missing required structure |
| D2 | **Schema** | Field types, constraints, required values | Type mismatch (e.g., `"confidence": "high"` instead of `0.95`) |
| D3 | **Latency** | Response time within acceptable bounds | Provider degradation, network issues |
| D4 | **Cost** | Token consumption within budget | Runaway token usage, inefficient prompting |
| D5 | **Identity** | Output semantically relevant to input | Hallucination drift, provider confusion |
| D6 | **Integrity** | No forbidden patterns (errors, stack traces, PII) | Leaked internals, unsafe content |

Each dimension produces a score (0.0–1.0) and a pass/fail verdict. The 6 dimensions aggregate into an overall verdict:

- **pass** — all 6 dimensions pass (confidence = 1.0)
- **partial** — confidence ≥ threshold (default 0.6) but some dimensions fail
- **fail** — confidence < threshold, or critical dimensions fail catastrophically

## Recomputable Proof

Every evaluation produces a **proof hash** — a SHA-256 digest binding the input, rules, and verdict. Anyone can recompute the same hash independently:

```python
# The proof hash is in the result metadata
proof_hash = result.metadata["correctover_proof_hash"]

# Anyone with the same inputs + rules gets the same hash
# This proves the verdict was computed correctly
```

This is the foundation of the Correctover standard: **verification is reproducible, not just reported**.

- **Standards**: [STANDARDS.md](https://github.com/Correctover/correctover-crewai/blob/main/STANDARDS.md)
- **Conformance Board**: [api.babyblueviper.com/conformance](https://api.babyblueviper.com/conformance)
- **Independent Verification**: All verdicts independently verified by [babyblueviper1](https://github.com/babyblueviper1)

## Configuration Reference

```python
from correctover_patronus import CorrectoverConfig

config = CorrectoverConfig(
    # D1: Structure rules
    structure_rules={
        "format": "json",           # "json" or "text"
        "required_keys": ["answer"], # Required top-level keys (JSON mode)
        "min_length": 1,            # Minimum output length (text mode)
    },

    # D2: Schema rules
    schema_rules={
        "require_json": True,        # Fail if output isn't valid JSON
        "fields": {
            "answer": {"type": "string", "required": True},
            "confidence": {"type": "number", "required": True},
        },
    },

    # D3: Latency rules
    latency_rules={
        "max_ms": 5000,              # Maximum acceptable latency
    },

    # D4: Cost rules
    cost_rules={
        "max_tokens": 2000,          # Maximum token budget per call
    },

    # D5: Identity rules
    identity_rules={
        "min_similarity": 0.3,       # Minimum input-output similarity
    },

    # D6: Integrity rules
    integrity_rules={
        "forbidden_patterns": [      # Regex patterns that must NOT appear
            r"Traceback \(most recent call last\)",
            r"Error:\s",
            r"Exception:\s",
            r"Internal Server Error",
        ],
    },

    # Aggregate settings
    min_confidence=0.6,              # Below this → verdict="fail"

    # Metadata (for proof package)
    provider_name="openai",
    model_name="gpt-4",
    agent_role="Researcher",
    task_description="Research task",
)
```

## Result Metadata

Every `EvaluationResult` includes rich metadata:

| Key | Description |
|-----|-------------|
| `correctover_verdict` | `pass`, `partial`, or `fail` |
| `correctover_confidence` | Aggregate confidence score (0.0–1.0) |
| `correctover_drift_score` | Output drift magnitude (1.0 - confidence) |
| `correctover_proof_hash` | SHA-256 recomputable proof |
| `correctover_input_hash` | SHA-256 of canonical input |
| `correctover_verification_latency_ms` | Verification time (typically < 1ms) |
| `correctover_should_failover` | Whether failover is recommended |
| `correctover_{dim}_passed` | Per-dimension pass/fail |
| `correctover_{dim}_score` | Per-dimension score (0.0–1.0) |
| `correctover_{dim}_detail` | Per-dimension human-readable detail |

## Requirements

- Python 3.10+
- `correctover-crewai >= 0.1.0`
- `patronus >= 0.1.10`

## Performance

| Metric | Value |
|--------|-------|
| P50 verification latency | 22 μs |
| SDK size | 586 KB |
| Rules evaluated | 87 |
| Dimensions | 6 |
| Supported providers | 7 |

## Links

- **Correctover Standards**: [STANDARDS.md](https://github.com/Correctover/correctover-crewai/blob/main/STANDARDS.md)
- **Conformance Board**: [api.babyblueviper.com/conformance](https://api.babyblueviper.com/conformance)
- **PyPI**: [correctover-patronus](https://pypi.org/project/correctover-patronus/)
- **GitHub**: [Correctover/correctover-patronus](https://github.com/Correctover/correctover-patronus)

## License

Apache 2.0

---

*Correctover — Failover ≠ Correctover™*
