Metadata-Version: 2.4
Name: orca-verify
Version: 0.1.0
Summary: Your agent proposes. Orca verifies. You decide. A drop-in verification layer for LLM/agent outputs.
Author: Carlos
License: MIT
Project-URL: Homepage, https://github.com/aisona-lab/OrcaI
Project-URL: Source, https://github.com/aisona-lab/OrcaI
Project-URL: Issues, https://github.com/aisona-lab/OrcaI/issues
Keywords: llm,agents,verification,guardrails,grounding,ai,rag
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Quality Assurance
Classifier: Typing :: Typed
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pydantic>=2
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.39; extra == "anthropic"
Provides-Extra: openai
Requires-Dist: openai>=1.40; extra == "openai"
Provides-Extra: local
Requires-Dist: httpx>=0.27; extra == "local"
Provides-Extra: dev
Requires-Dist: pytest>=8; extra == "dev"
Requires-Dist: pytest-cov>=5; extra == "dev"
Requires-Dist: ruff>=0.6; extra == "dev"
Dynamic: license-file

<div align="center">

# Orca

**Your agent proposes. Orca verifies. You decide.**

A drop-in, framework-agnostic verification layer for LLM and agent outputs in Python.

[![CI](https://github.com/aisona-lab/OrcaI/actions/workflows/ci.yml/badge.svg)](https://github.com/aisona-lab/OrcaI/actions)
![Python](https://img.shields.io/badge/python-3.11%2B-blue)
![License](https://img.shields.io/badge/license-MIT-green)

</div>

## Why Orca

Everyone ships AI agents. Almost everyone ships only the happy path: prompt, model, output, use it. Orca is the layer most projects skip, the one that makes an output trustworthy before it leaves your system.

You declare *checks*. An output ships only if it passes. Otherwise Orca retries with feedback, optionally repairs, escalates to a human, or rejects, and it records every decision along the way.

```python
from orcaverify import verify, Schema, Grounded, NoPII

@verify(
    Schema(Report),                       # output matches your shape
    Grounded(sources="kb", judge=judge),  # every claim backed by a source
    NoPII(),                              # no leaked personal data
    on_fail="retry(2) -> escalate",       # what to do when it fails
)
def investigate(alert) -> Report:
    return agent.run(alert)               # your agent, any framework
```

On pass you get the value back. On failure Orca retries with the failure reasons as feedback, then escalates to a human. You never ship an unverified output by accident.

## Install

```bash
pip install orca-verify
```

Optional model backends, used by `Grounded`, `Faithful`, `Rubric`, and repair:

```bash
pip install "orca-verify[anthropic]"   # Claude
pip install "orca-verify[openai]"      # OpenAI
pip install "orca-verify[local]"       # Ollama, vLLM, LM Studio (on-prem, air-gapped)
```

## Two concepts

A `Check` is one isolated, testable unit of verification. A `Verifier` runs a list of checks and applies an `on_fail` policy. The `@verify` decorator is sugar over `Verifier`.

```python
from orcaverify import Verifier, Predicate

gate = Verifier([Predicate(lambda out, ctx: (len(out) > 0, "empty output"))])
result = gate.check(output)          # verify a value you already have
if not result.ok:
    for f in result.failures:
        print(f.reason)
```

## Checks

| Check | What it enforces |
|---|---|
| `Schema(Model)` | Output validates against a Pydantic model |
| `Predicate(fn)` | Any custom rule, the universal escape hatch |
| `Grounded(sources, judge)` | Every claim is supported by a retrieved source, cite or reject |
| `Faithful(sources, judge)` | No claim contradicts the sources, consistency rather than support |
| `Rubric(criteria, judge)` | LLM-as-judge scoring against named criteria, passes above a threshold |
| `NoPII()` / `NoSecrets()` | Output does not leak personal data or credentials |

## on_fail policy

A chain, tried left to right, for example `"retry(2) -> repair -> escalate -> reject"`.

| Step | Behavior |
|---|---|
| `retry(n)` | Re-run the producer with the failure reasons injected as feedback |
| `repair` | A judge rewrites the output to satisfy the failed checks (opt-in, needs a judge) |
| `escalate` | Hand off to a human-in-the-loop callback |
| `reject` | Return `ok=False`, or raise `VerificationError` |

## Judges

`Grounded`, `Faithful`, `Rubric`, and repair use a pluggable `Judge`. Orca ships with `AnthropicJudge`, `OpenAIJudge`, and `LocalJudge`. `LocalJudge` targets any OpenAI-compatible server plus Ollama, so it runs fully on-prem or air-gapped. You can also implement the protocol yourself.

## Trace

Every run returns a JSON-serializable `VerifyResult`: the input, which checks passed or failed and why, the retries, and the final decision. Point a `FileSink` or `LoggerSink` at it and every verification is recorded.

## Tamper-evident audit trail

Plug `Provenance` in as the sink and every decision becomes a hash-chained, append-only record. Edit, delete, or reorder any record and `verify()` catches it. This is the audit trail a regulator actually wants to see.

```python
from orcaverify import Verifier, NoPII, Provenance

prov = Verifier([NoPII()], sink=Provenance("audit.jsonl"))
# ... run verifications ...

prov.sink.verify()                            # ChainResult(ok=True/False, broken_at=...)
prov.sink.export("audit.json")                # full chain plus integrity summary
prov.sink.record({"event": "data_access"})    # log any auditable event
```

Storage is pluggable (`FileStore`, `InMemoryStore`, or your own `ProvenanceStore`). Integrity is plain SHA-256 hash chaining, with no keys to manage.

## Extend it: registry and plugins

Register your own check, then compose verifiers from config. Built-in checks are registered under `schema`, `predicate`, `grounded`, `faithful`, `rubric`, `no_pii`, and `no_secrets`.

```python
from orcaverify import Check, CheckResult, register, from_config

@register("max_length")
class MaxLength(Check):
    def __init__(self, limit=280):
        self.limit = limit

    def check(self, output, context=None):
        n = len(str(output))
        return CheckResult(ok=n <= self.limit, reason=None if n <= self.limit else f"too long: {n}")

gate = from_config({
    "checks": ["no_pii", {"max_length": {"limit": 280}}, "grounded"],
    "on_fail": "retry(2) -> reject",
}, judge=judge)   # the judge is auto-injected into checks that need it
```

Ship checks in your own package and expose them through entry points. `load_plugins()` discovers and registers them automatically:

```toml
[project.entry-points."orcaverify.checks"]
toxicity = "my_pkg.checks:Toxicity"
```

## Run the demos

All demos run offline, with no API key.

```bash
python examples/rag_grounding.py      # catches an ungrounded claim
python examples/aml_investigation.py  # schema, grounding, and no-PII gate
python examples/audit_trail.py        # tamper-evident provenance log
python examples/quality_checks.py     # rubric scoring and faithfulness
python examples/custom_check.py       # register a check and build from config
```

## Roadmap

- More checks: toxicity and safety, JSON repair.
- Provenance backends: Postgres and S3 stores, optional HMAC or Ed25519 signing.
- Gateway mode: language-agnostic HTTP interception.
- TypeScript port.

Contributions are welcome. Each check and judge is a small, isolated module.

## License

MIT
