Metadata-Version: 2.4
Name: director-ai
Version: 3.15.1
Summary: Real-time LLM hallucination guardrail — NLI + RAG fact-checking with token-level streaming halt
Author-email: Miroslav Šotek <protoscience@anulum.li>
License: AGPL-3.0-or-later
Project-URL: Homepage, https://www.anulum.li
Project-URL: Repository, https://github.com/anulum/director-ai
Project-URL: Issues, https://github.com/anulum/director-ai/issues
Project-URL: Changelog, https://github.com/anulum/director-ai/blob/main/CHANGELOG.md
Project-URL: Documentation, https://anulum.github.io/director-ai
Project-URL: Discussions, https://discord.gg/JvMdKv49
Keywords: llm,hallucination,guardrail,nli,rag,fact-checking,streaming,coherence,deberta,openai,anthropic,langchain
Classifier: License :: OSI Approved :: GNU Affero General Public License v3 or later (AGPLv3+)
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE.md
Requires-Dist: numpy>=1.24
Requires-Dist: requests>=2.32
Provides-Extra: nli
Requires-Dist: torch<3,>=2.8; extra == "nli"
Requires-Dist: transformers<6,>=5.0.0rc3; extra == "nli"
Provides-Extra: vector
Requires-Dist: chromadb<2,>=0.4.0; extra == "vector"
Requires-Dist: sentence-transformers<6,>=4; extra == "vector"
Provides-Extra: openai
Requires-Dist: openai>=1.0; extra == "openai"
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.20; extra == "anthropic"
Provides-Extra: langchain
Requires-Dist: langchain-core>=0.3; extra == "langchain"
Requires-Dist: langsmith>=0.8.0; extra == "langchain"
Provides-Extra: llamaindex
Requires-Dist: llama-index-core>=0.10; extra == "llamaindex"
Provides-Extra: server
Requires-Dist: fastapi<1,>=0.100; extra == "server"
Requires-Dist: uvicorn<1,>=0.23; extra == "server"
Requires-Dist: pydantic<3,>=2.0; extra == "server"
Requires-Dist: httpx<1,>=0.27; extra == "server"
Requires-Dist: python-multipart<1,>=0.0.7; extra == "server"
Requires-Dist: slowapi<1,>=0.1.9; extra == "server"
Provides-Extra: minicheck
Provides-Extra: voice
Requires-Dist: elevenlabs>=1.0; extra == "voice"
Requires-Dist: openai>=1.0; extra == "voice"
Requires-Dist: deepgram-sdk>=3.0; extra == "voice"
Provides-Extra: onnx
Requires-Dist: onnx<2,>=1.21; extra == "onnx"
Requires-Dist: onnxruntime<2,>=1.15; extra == "onnx"
Provides-Extra: tensorrt
Requires-Dist: onnx<2,>=1.21; extra == "tensorrt"
Requires-Dist: onnxruntime-gpu<2,>=1.15; extra == "tensorrt"
Provides-Extra: grpc
Requires-Dist: grpcio>=1.60; extra == "grpc"
Requires-Dist: grpcio-tools>=1.60; extra == "grpc"
Requires-Dist: protobuf<7,>=4.25; extra == "grpc"
Provides-Extra: physical
Requires-Dist: mujoco<4,>=3.2; extra == "physical"
Provides-Extra: formal
Requires-Dist: z3-solver<5,>=4.12; extra == "formal"
Provides-Extra: finetune
Requires-Dist: torch<3,>=2.8; extra == "finetune"
Requires-Dist: transformers<6,>=5.0.0rc3; extra == "finetune"
Requires-Dist: datasets>=2.14; extra == "finetune"
Requires-Dist: accelerate>=0.21; extra == "finetune"
Requires-Dist: scikit-learn>=1.3; extra == "finetune"
Provides-Extra: quantize
Requires-Dist: bitsandbytes>=0.41; extra == "quantize"
Requires-Dist: accelerate>=0.21; extra == "quantize"
Provides-Extra: pinecone
Requires-Dist: pinecone>=5.0; extra == "pinecone"
Provides-Extra: weaviate
Requires-Dist: weaviate-client>=4.0; extra == "weaviate"
Provides-Extra: qdrant
Requires-Dist: qdrant-client>=1.7; extra == "qdrant"
Provides-Extra: faiss
Requires-Dist: faiss-cpu>=1.7; extra == "faiss"
Provides-Extra: elasticsearch
Requires-Dist: elasticsearch<9,>=8.0; extra == "elasticsearch"
Provides-Extra: reranker
Requires-Dist: sentence-transformers<6,>=4; extra == "reranker"
Provides-Extra: embeddings
Requires-Dist: sentence-transformers<6,>=4; extra == "embeddings"
Provides-Extra: license
Requires-Dist: polar-sdk==0.31.3; extra == "license"
Provides-Extra: embed
Requires-Dist: sentence-transformers<6,>=4; extra == "embed"
Provides-Extra: nli-lite
Requires-Dist: onnx<2,>=1.21; extra == "nli-lite"
Requires-Dist: onnxruntime<2,>=1.15; extra == "nli-lite"
Requires-Dist: transformers<6,>=5.0.0rc3; extra == "nli-lite"
Provides-Extra: langgraph
Requires-Dist: langgraph>=0.2; extra == "langgraph"
Requires-Dist: langsmith>=0.8.0; extra == "langgraph"
Provides-Extra: haystack
Requires-Dist: haystack-ai>=2.0; extra == "haystack"
Provides-Extra: crewai
Requires-Dist: crewai>=0.50; extra == "crewai"
Provides-Extra: guardrails
Provides-Extra: docs
Requires-Dist: mkdocs>=1.5; extra == "docs"
Requires-Dist: mkdocs-material>=9.5; extra == "docs"
Requires-Dist: mkdocstrings[python]>=0.24; extra == "docs"
Requires-Dist: mkdocs-jupyter>=0.25; extra == "docs"
Requires-Dist: nbconvert>=7.17.1; extra == "docs"
Provides-Extra: otel
Requires-Dist: opentelemetry-api>=1.20; extra == "otel"
Provides-Extra: langfuse
Requires-Dist: langfuse>=2.0; extra == "langfuse"
Provides-Extra: presidio
Requires-Dist: presidio-analyzer>=2.2; extra == "presidio"
Provides-Extra: toxicity
Requires-Dist: detoxify>=0.5; extra == "toxicity"
Provides-Extra: moderation
Requires-Dist: presidio-analyzer>=2.2; extra == "moderation"
Requires-Dist: detoxify>=0.5; extra == "moderation"
Provides-Extra: demo
Requires-Dist: gradio>=4.0; extra == "demo"
Provides-Extra: ingestion
Requires-Dist: pypdf>=3.0; extra == "ingestion"
Requires-Dist: python-docx>=1.0; extra == "ingestion"
Requires-Dist: beautifulsoup4>=4.12; extra == "ingestion"
Provides-Extra: ingestion-s3
Requires-Dist: boto3>=1.26; extra == "ingestion-s3"
Provides-Extra: ingestion-notion
Requires-Dist: notion-client>=2.0; extra == "ingestion-notion"
Provides-Extra: ingestion-gdrive
Requires-Dist: google-api-python-client>=2.100; extra == "ingestion-gdrive"
Provides-Extra: auto-kb
Requires-Dist: boto3>=1.26; extra == "auto-kb"
Requires-Dist: notion-client>=2.0; extra == "auto-kb"
Requires-Dist: google-api-python-client>=2.100; extra == "auto-kb"
Provides-Extra: colbert
Requires-Dist: ragatouille>=0.0.8; extra == "colbert"
Provides-Extra: rust
Requires-Dist: backfire-kernel<0.2,>=0.1.0; extra == "rust"
Provides-Extra: enterprise
Requires-Dist: redis<8,>=4.5; extra == "enterprise"
Requires-Dist: pyjwt<3,>=2.8; extra == "enterprise"
Requires-Dist: argon2-cffi<26,>=23.1; extra == "enterprise"
Requires-Dist: psycopg2-binary<3,>=2.9; extra == "enterprise"
Provides-Extra: ui
Requires-Dist: gradio<7,>=4.0; extra == "ui"
Provides-Extra: reports
Requires-Dist: weasyprint>=60; extra == "reports"
Requires-Dist: jinja2>=3.1; extra == "reports"
Provides-Extra: autogen
Provides-Extra: research
Provides-Extra: train
Requires-Dist: transformers<6,>=5.0.0rc3; extra == "train"
Requires-Dist: datasets>=2.14; extra == "train"
Requires-Dist: accelerate>=0.21; extra == "train"
Requires-Dist: peft>=0.6; extra == "train"
Requires-Dist: pillow<13,>=10; extra == "train"
Provides-Extra: managed-training
Requires-Dist: google-cloud-aiplatform>=1.133; extra == "managed-training"
Requires-Dist: google-cloud-storage>=2.14; extra == "managed-training"
Provides-Extra: security
Requires-Dist: cyclonedx-bom>=4.0; extra == "security"
Requires-Dist: hypothesis>=6.0; extra == "security"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.21; extra == "dev"
Requires-Dist: ruff<1,>=0.5; extra == "dev"
Requires-Dist: mypy>=1.10; extra == "dev"
Requires-Dist: types-requests>=2.31; extra == "dev"
Requires-Dist: types-PyYAML>=6.0; extra == "dev"
Requires-Dist: hypothesis>=6.0; extra == "dev"
Requires-Dist: grpcio>=1.60; extra == "dev"
Requires-Dist: grpcio-tools>=1.60; extra == "dev"
Requires-Dist: protobuf<7,>=4.25; extra == "dev"
Requires-Dist: bandit>=1.7; extra == "dev"
Requires-Dist: pyyaml>=6.0; extra == "dev"
Requires-Dist: build>=1.0; extra == "dev"
Requires-Dist: twine>=5.0; extra == "dev"
Requires-Dist: fastapi>=0.100; extra == "dev"
Requires-Dist: httpx>=0.27; extra == "dev"
Requires-Dist: pydantic>=2.0; extra == "dev"
Requires-Dist: python-multipart>=0.0.7; extra == "dev"
Requires-Dist: pypdf>=3.0; extra == "dev"
Requires-Dist: python-docx>=1.0; extra == "dev"
Requires-Dist: beautifulsoup4>=4.12; extra == "dev"
Dynamic: license-file

<p align="center">
  <img src="docs/assets/header.png" width="1280" alt="Director-AI — Real-time LLM Hallucination Guardrail">
</p>

<h1 align="center">Director-AI</h1>

<p align="center">
  <strong>Real-time LLM hallucination guardrail — NLI + RAG fact-checking with token-level streaming halt</strong>
</p>

<p align="center">
  <a href="https://github.com/anulum/director-ai/actions/workflows/ci.yml"><img src="https://github.com/anulum/director-ai/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
  <a href="https://github.com/anulum/director-ai/actions/workflows/pre-commit.yml"><img src="https://github.com/anulum/director-ai/actions/workflows/pre-commit.yml/badge.svg" alt="Pre-commit"></a>
  <a href="https://github.com/anulum/director-ai/actions/workflows/codeql.yml"><img src="https://github.com/anulum/director-ai/actions/workflows/codeql.yml/badge.svg" alt="CodeQL"></a>
  <a href="https://pypi.org/project/director-ai/"><img src="https://img.shields.io/pypi/v/director-ai.svg" alt="PyPI"></a>
  <a href="https://pypi.org/project/director-ai/"><img src="https://img.shields.io/pypi/dm/director-ai.svg" alt="Downloads"></a>
  <a href="https://pepy.tech/projects/director-ai"><img src="https://img.shields.io/pepy/dt/director-ai.svg" alt="Total downloads"></a>
  <a href="https://codecov.io/gh/anulum/director-ai"><img src="https://codecov.io/gh/anulum/director-ai/branch/main/graph/badge.svg" alt="Coverage"></a>
  <a href="https://www.python.org/downloads/"><img src="https://img.shields.io/pypi/pyversions/director-ai.svg" alt="Python"></a>
  <a href="https://www.gnu.org/licenses/agpl-3.0"><img src="https://img.shields.io/badge/License-AGPL_v3-blue.svg" alt="License: AGPL v3"></a>
  <a href="https://doi.org/10.5281/zenodo.18822167"><img src="https://zenodo.org/badge/doi/10.5281/zenodo.18822167.svg" alt="DOI"></a>
  <a href="https://anulum.github.io/director-ai"><img src="https://img.shields.io/badge/docs-mkdocs-blue.svg" alt="Docs"></a>
  <a href="https://www.bestpractices.dev/projects/12102"><img src="https://www.bestpractices.dev/projects/12102/badge" alt="OpenSSF Best Practices"></a>
  <a href="https://securityscorecards.dev/viewer/?uri=github.com/anulum/director-ai"><img src="https://api.securityscorecards.dev/projects/github.com/anulum/director-ai/badge" alt="OpenSSF Scorecard"></a>
  <a href="https://api.reuse.software/info/github.com/anulum/director-ai"><img src="https://api.reuse.software/badge/github.com/anulum/director-ai" alt="REUSE"></a>
</p>

---

## About

Director-AI is an internal research tool developed at [ANULUM Institute](https://www.anulum.li) as part of the [God of the Math Collection](https://www.anulum.li) (GOTM) — a multi-project scientific computing ecosystem spanning neuroscience, plasma physics, stochastic computing, and AI safety.

The system was built to solve a specific internal need: **real-time hallucination detection for LLM outputs used in scientific pipelines**, where a single fabricated number or citation can invalidate downstream analysis. It is now commercially offered under dual licensing.

**Team:** ANULUM maintains a research team (intentionally undisclosed). GitHub automation and repository maintenance are handled by the owner. Contributions are welcome under AGPL v3 terms.

> **Active Development** — APIs may evolve. The core guardrail engine, 5-tier scoring (rules → embeddings → NLI), 7-SDK guard, FastAPI middleware, REST/gRPC servers, injection detection, SaaS middleware (API keys + rate limiting), advanced RAG (6 pluggable retrieval backends), multi-agent swarm guardian (4 framework adapters), config wizard, and compliance reports are functional and tested (5300+ passing tests). Rust-accelerated compute paths shipped in the v3.12 line and remain part of the current v3.14 release surface.

---

## What It Does

Director-AI sits between your LLM and the user. It scores every output for hallucination — and can halt generation mid-stream when coherence drops.

```mermaid
graph LR
    LLM["LLM<br/>(any provider)"] --> D["Director-AI"]
    D --> S["Scorer<br/>NLI + RAG"]
    D --> K["StreamingKernel<br/>token-level halt"]
    S --> V{Approved?}
    K --> V
    V -->|Yes| U["User"]
    V -->|No| H["HALT + evidence"]
```

### Core capabilities

- **Token-level streaming halt** — severs output mid-generation when coherence degrades. Not post-hoc review.
- **Dual-entropy scoring** — NLI contradiction detection (0.4B DeBERTa) + RAG fact-checking against your knowledge base.
- **Selectable scorer models** — choose a benchmarked local scorer profile for the latency/accuracy trade-off you need, without changing the guarded LLM provider.
- **Customer Model Factory** — validate customer-owned guardrail traces, bind training/benchmark/deployment evidence, add sector packs such as Banking, and export runtime packages for private customer implementation.
- **Structured output verification** — JSON schema validation, numeric consistency, reasoning chain verification, temporal freshness scoring. Stdlib-only, zero dependencies.
- **Intent-grounded injection detection** — two-stage pipeline: regex pattern matching (fast) + bidirectional NLI divergence scoring (semantic). Detects the *effect* of injection in the output.
- **12 Rust-accelerated compute functions** — 9.4× geometric mean speedup over Python paths. Transparent fallback when Rust kernel is not installed.

<!-- capability-snapshot:start -->
<!-- SPDX-License-Identifier: AGPL-3.0-or-later -->
<!-- Generated by tools/capability_manifest.py; do not edit counts by hand. -->

### Director-AI Capability Inventory

| Surface | Current inventory |
|---|---:|
| Package version | 3.15.1 |
| Public API exports | 214 |
| Python capability source modules | 310 |
| Python capability classes | 701 |
| API documentation pages | 49 |
| Rust PyO3 bindings | 59 |
| Optional extras | 53 |
| Python test files | 420 |
| Public documentation pages | 138 |
| GitHub Actions workflows | 11 |

Evidence boundary: this snapshot is a static inventory. Performance, coverage, hardware, and scientific-fidelity claims require their own committed evidence artifacts.
<!-- capability-snapshot:end -->

### Selectable scorer models

Director-AI guards any upstream LLM, but the guardrail scorer itself is
configurable. Stable runtime choices are exposed through
`GET /v1/scorer/models` and selected with `DIRECTOR_SCORER_MODEL`:

| Alias | Runtime source | Status | General BA | Use when |
|-------|----------------|--------|-----------:|----------|
| `balanced-default` | managed FactCG DeBERTa v3 large artefact | stable | 0.752 | default balanced accuracy/latency profile |
| `deberta-small` | managed DeBERTa v3 small artefact | stable | 0.747 | lower-cost deployments close to default accuracy |
| `deberta-large-nli` | managed DeBERTa v3 large NLI artefact | stable | 0.740 | alternate large-NLI baseline |

```bash
DIRECTOR_SCORER_MODEL=balanced-default director-ai serve
DIRECTOR_SCORER_MODEL=deberta-small director-ai serve
```

Domain-only and custom scorer models require explicit operator opt-in:
`DIRECTOR_ALLOW_DOMAIN_ONLY_SCORER_MODEL=true` or
`DIRECTOR_ALLOW_CUSTOM_SCORER_MODEL=true`. Each selectable scorer has a
per-model benchmark package plan in
[`benchmarks/model_benchmark_packages.toml`](benchmarks/model_benchmark_packages.toml);
full external benchmark packages are required before public model-specific claims.

### Customer Model Factory

Director-AI can package customer-specific guardrail scorers without changing
the guarded application provider. The implemented factory primitives cover:

- customer trace validation with split, leakage, tenant-boundary, severity,
  reference, and secrets/redaction checks;
- training manifests with immutable base-model provenance and Vertex,
  customer-cloud, on-prem, or local-pilot lanes;
- benchmark selection with conservative, balanced, low-latency, high-recall,
  and zero silent unsafe passes objective profiles;
- deployment, evidence-pack, and runtime-package manifests with deterministic
  hashes, audit-log URIs, rollback URIs, customer-controlled telemetry, and no
  external callback by default;
- Banking as the first vertical pack, including regulated-category taxonomy,
  citation evidence, numeric evidence, escalation, and regulation mapping.

Customer examples are local helpers that consume the generated runtime package
shape without opening network connections:

```bash
python examples/customer_model_factory_runtime.py
python examples/customer_model_factory_rest_payload.py
```

The runtime package schema is
[`schemas/customer-model-factory-runtime-package.schema.json`](schemas/customer-model-factory-runtime-package.schema.json).
Customer-specific accuracy claims require package-specific benchmark evidence;
the factory exposes the controls needed to pursue high-assurance deployments
without making unscoped accuracy promises.

### Advanced RAG (6 pluggable retrieval strategies)

All independently toggleable via config, composable as a decorator stack:

| Strategy | What it does | Config field |
|----------|-------------|--------------|
| **Parent-child chunking** | Index small chunks, return large parents for context | `parent_child_enabled` |
| **Adaptive retrieval** | Skip KB lookup for creative/conversational queries | `adaptive_retrieval_enabled` |
| **HyDE** | LLM generates pseudo-answer, embeds that for retrieval | `hyde_enabled` |
| **Query decomposition** | Split compound queries, retrieve for each, merge via RRF | `query_decomposition_enabled` |
| **Contextual compression** | Keep only query-relevant sentences from retrieved passages | `contextual_compression_enabled` |
| **Multi-vector** | Index content + summary + title representations per doc | `multi_vector_enabled` |

On top of the existing hybrid (BM25+dense), cross-encoder reranking, ColBERT, and 11 vector backends (Chroma, Pinecone, Qdrant, FAISS, Weaviate, Elasticsearch, etc.).

### Multi-agent swarm guardian

Guard entire agent swarms — not just individual LLM calls:

- **SwarmGuardian**: central registry with cross-agent contradiction detection + cascade halt
- **AgentProfile**: per-agent thresholds (researcher vs summariser vs coder)
- **HandoffScorer**: score inter-agent messages before handoff
- **Framework adapters**: LangGraph, CrewAI, OpenAI Swarm, AutoGen — zero framework deps

### Additional modules

Meta-confidence estimation, online calibration from feedback, contradiction tracking across turns, agentic loop monitoring, adversarial robustness testing (25 patterns), EU AI Act audit trails, domain presets (medical/finance/legal/creative), cross-model consensus, conformal prediction intervals, token cost analyser, compliance report templates (HTML/Markdown), config wizard (Gradio UI + CLI).

### Agent safety hooks

Opt-in modules that plug into `CoherenceAgent` without changing
existing behaviour — configured together or not at all.

- **Cyber-physical grounding** (`core.cyber_physical`) — pre-action
  AABB / sphere collision and two-link analytical IK; lazy-loaded
  ROS 2 / MuJoCo / CARLA adapters.
- **Simulation containment** (`core.containment`) — HMAC-signed
  `RealityAnchor` binding a session to a `sandbox` / `simulator` /
  `shadow` / `production` scope, with a rule-based breakout
  detector (production-host calls, anti-anchor prompt injection,
  scope mismatch).
- **Cross-org passports** (`core.zk_attestation`) — `PassportIssuer`
  and `PassportVerifier` with an HMAC Merkle commitment backend
  plus a `ZkSnarkBackend` plug-in Protocol for real zero-knowledge
  adapters.

See the [API reference](docs-site/api/cyber-physical.md) pages for
the full surface.

### Multi-language components (all optional)

| Component | Path | Purpose |
|-----------|------|---------|
| **Rust `backfire-kernel`** | `backfire-kernel/` | 28 hot-path compute functions via PyO3 — scorer / injection / safety-hook primitives with pure-Python fallbacks |
| **Go gateway** | `gateway/go/` | High-concurrency HTTP front door with auth, rate limit, audit, optional scoring sidecar |
| **`director.v1` wire schema** | `schemas/proto/` | Frozen protobuf messages shared by Python and Go |
| **CoherenceScoring gRPC** | `src/director_ai/grpc_scoring.py` | `ScoreClaim` unary + `ScoreStream` bidi RPCs over `director.v1` |
| **Julia threshold tuner** | `tools/julia_tuner/` | Offline bootstrap + Bayesian threshold analysis with uncertainty bands |
| **Lean 4 formal proof** | `formal/HaltMonitor/` | Machine-checked guarantee that sub-threshold tokens cannot be emitted |

Python stands on its own — every non-Python component is additive and
toggled by an env var, flag, or optional dependency. See
[`ARCHITECTURE.md`](ARCHITECTURE.md) for the full layout and
[`gateway/go/README.md`](gateway/go/README.md),
[`tools/julia_tuner/README.md`](tools/julia_tuner/README.md),
[`formal/README.md`](formal/README.md),
[`schemas/README.md`](schemas/README.md) for per-component details.

Full documentation: [anulum.github.io/director-ai](https://anulum.github.io/director-ai)

---

## Quick Start

### Wrap your SDK (6 lines)

```python
from director_ai import guard
from openai import OpenAI

client = guard(
    OpenAI(),
    facts={"refund_policy": "Refunds within 30 days only"},
)
response = client.chat.completions.create(
    model="gpt-4o-mini",
    messages=[{"role": "user", "content": "What is the refund policy?"}],
)
```

### One-shot check (4 lines)

```python
from director_ai import score

cs = score("What is the refund policy?", response_text,
           facts={"refund": "Refunds within 30 days only"},
           threshold=0.3)
print(f"Coherence: {cs.score:.3f}  Approved: {cs.approved}")
```

### Proxy (2 lines, zero code changes)

```bash
pip install director-ai[server]
director-ai proxy --port 8080 --facts kb.txt --threshold 0.3
```

Set `OPENAI_BASE_URL=http://localhost:8080/v1` in your app. Every response gets scored.

### FastAPI middleware (3 lines)

```python
from director_ai.integrations.fastapi_guard import DirectorGuard

app.add_middleware(DirectorGuard,
    facts={"policy": "Refunds within 30 days only"},
    on_fail="reject",
)
```

Also available: LangChain, LlamaIndex, LangGraph, Haystack, CrewAI, Semantic Kernel, DSPy integrations.

---

## Installation

```bash
pip install "director-ai[nli]"                    # recommended — NLI model scoring (75.6% BA)
pip install "director-ai[embed]"                   # embedding scorer (~65% BA, CPU-only, 3ms)
pip install director-ai                            # rule-based + heuristic (zero ML deps, <1ms)
pip install "director-ai[nli,vector,server]"       # production stack with RAG + REST API
pip install "director-ai[ui]"                      # config wizard (Gradio web UI)
pip install "director-ai[reports]"                 # PDF/HTML compliance reports
pip install "director-ai[physical]"                # MuJoCo physical adapter runtime
```

For reproducible installs the repo ships a `uv.lock` at the root;
`uv sync` installs the exact resolved versions.
Heavy optional extras use the policy in
`requirements/OPTIONAL_EXTRA_LOCKS.md`.
ROS 2 and CARLA are vendor/distribution installs; keep them in the same
isolated runtime as `[physical]`, not in the default quickstart environment.
ZK prover adapters are also isolated operator runtimes: pin the prover,
verifier, circuit artefacts, and proving key by immutable release or digest,
and keep `CommitmentBackend` enabled as the fallback.

The MiniCheck backend is opt-in and not on PyPI — install it manually
alongside any other extras:

```bash
pip install "minicheck @ git+https://github.com/Liyan06/MiniCheck.git"
```

### 5-tier scoring backends

| Tier | Backend | Accuracy | Latency | Install |
|------|---------|----------|---------|---------|
| **5** | NLI (FactCG) | **75.6% BA** | 14.6 ms | `[nli]` |
| **4** | Distilled NLI (preview) | validation required | measured per artefact | `[nli-lite]` |
| **3** | Embedding (bge-small) | ~65% BA | 3 ms | `[embed]` |
| **2** | Rules engine (8 rules) | rule-based | <1 ms | — (base) |
| **1** | Heuristic (lite) | ~55% BA | <1 ms | — (base) |

Select via config: `scorer_backend="rules"`, `"embed"`, `"deberta"`, or `"lite"`.

| Layer | What you get | Install extra |
|-------|-------------|---------------|
| **Core** (zero heavy deps) | `CoherenceScorer`, `StreamingKernel`, `GroundTruthStore`, rules engine | — |
| **Embeddings** | Sentence-transformer cosine-similarity scorer | `[embed]` |
| **NLI models** | DeBERTa, FactCG, MiniCheck, ONNX Runtime | `[nli]` |
| **Vector DBs** | Chroma, Pinecone, Weaviate, Qdrant | `[vector]` / `[pinecone]` / etc. |
| **Server** | FastAPI + Uvicorn REST/gRPC | `[server]` |
| **Rust kernel** | 12 accelerated compute functions | `[rust]` (requires maturin) |
| **Voice** | ElevenLabs, OpenAI TTS, Deepgram adapters | `[voice]` |

Python 3.11+. Full guide: [docs/installation](https://anulum.github.io/director-ai/installation/).

---

## Benchmarks

### Accuracy — LLM-AggreFact (29,320 samples)

Two judges ship with this release.

**Default — `yaxili96/FactCG-DeBERTa-v3-Large`** (0.4B params, MIT). The fast NLI baseline.

| Rank | Model | Per-dataset mean BA | Params | Latency | Streaming |
|------|-------|---------------------|--------|---------|-----------|
| #1 | Bespoke-MiniCheck-7B | **77.4%** | 7B | ~100 ms | No |
| **#6** | **Director-AI (FactCG)** | **75.6%** | 0.4B | **14.6 ms** | **Yes** |
| #8 | MiniCheck-Flan-T5-L | 75.0% | 0.8B | ~120 ms | No |

With per-dataset threshold tuning (no retraining), FactCG reaches **77.76%** — ahead of Bespoke-MiniCheck-7B (#1 at 77.4%). This is the same 0.4B model, single `pip install`, 14.6 ms latency.

Latency: 14.6 ms/pair on GTX 1060 6GB (ONNX GPU, 16-pair batch). Full comparison: [`benchmarks/comparison/COMPETITOR_COMPARISON.md`](benchmarks/comparison/COMPETITOR_COMPARISON.md).

> **Note on metrics.** The numbers in the table above use the
> AggreFact leaderboard convention — **per-dataset mean balanced
> accuracy across the 11 datasets** ([source: llm-aggrefact.github.io](https://llm-aggrefact.github.io/)).
> Sample-pooled balanced accuracy is a different metric and is
> systematically higher on heterogeneous benchmarks. Both numbers
> are reported in `training/EXPERIMENT_RESULTS.md` for
> traceability.

**Optional — Gemma 4 E4B Q6 with per-task-family routing.** A zero-training LLM-as-judge alternative for users who prefer LLM-as-judge architectures over NLI. Per-task-family prompts (`summ` / `rag` / `claim`) bring the routed Gemma judge to 75.55% per-dataset mean BA on the AggreFact 29K test set, comparable to the FactCG default. The routed judge is opt-in (`--backend llama-cpp`); FactCG remains the default.

### Rust compute acceleration (shipped in v3.12, current in v3.14)

12 functions, 5000 iterations each. Geometric mean: **9.4× speedup**.

| Function | Python (µs) | Rust (µs) | Speedup |
|----------|------------|-----------|---------|
| sanitizer_score | 57 | 2.1 | 27× |
| temporal_freshness | 53 | 2.5 | 21× |
| probs_to_confidence (200×3) | 486 | 15 | 33× |
| lite_score | 47 | 26 | 1.8× |

Full results: [`benchmarks/results/rust_compute_bench.json`](benchmarks/results/rust_compute_bench.json).

### Cross-platform NLI latency (p99, 16-pair batch)

| Platform | Type | Per-pair p99 | Batch p99 (16p) | Notes |
|----------|------|-------------|-----------------|-------|
| GTX 1060 6GB | CUDA 12.6 | **17.9 ms** | 287 ms | PyTorch FP32, 100 iterations |
| RX 6600 XT 8GB | ROCm 6.2 | 80.1 ms | 1,282 ms | hipBLAS fallback |
| EPYC 9575F 4C | CPU | 118.9 ms | 1,903 ms | UpCloud cloud, Zen 5 |
| Xeon E5-2640 2×6C | CPU | 207.3 ms | 3,317 ms | ML350 Gen8, 128 GB RAM |

Heuristic-only (no NLI): p99 < 0.5 ms on all platforms.
Raw data: [`benchmarks/results/`](benchmarks/results/).
Reproduction manifest:
[`benchmarks/PUBLIC_BENCHMARKS.md`](benchmarks/PUBLIC_BENCHMARKS.md).

---

## Known Limitations

Be aware of these before deploying:

- **Heuristic fallback is weak**: Without `[nli]`, scoring uses word-overlap (~55% accuracy). Not recommended for production.
- **Summarisation FPR is 10.5%**: Reduced from 95% via bidirectional NLI + baseline calibration (v3.5). Still too high for some use cases — tune thresholds per domain.
- **NLI needs KB grounding**: Without a knowledge base, stock regulated-domain profiles over-reject badly in checked artifacts (PubMedQA FPR=100%, FinanceBench FPR=100% at t=0.30). Treat them as calibration starting points.
- **ONNX CPU is slow**: 383 ms/pair without GPU. Use `onnxruntime-gpu` for production.
- **Long documents need ≥16 GB VRAM**: Chunked NLI on legal/financial docs exceeds 6 GB.
- **LLM-as-judge sends data externally**: When enabled, truncated prompt+response (500 chars) go to the configured provider. Off by default.
- **Domain presets are starting points**: Default thresholds need tuning for your data. Domain benchmark scripts exist but results are not yet validated.

---

## Docker

```bash
docker build -t director-ai .                          # CPU
docker build -f Dockerfile.gpu -t director-ai:gpu .    # GPU
docker run -p 8080:8080 director-ai                    # run
```

Kubernetes: [Helm chart](deploy/helm/director-ai/) with GPU toggle, HPA, Sigstore-signed releases.

---

## Citation

```bibtex
@software{sotek2026director,
  author    = {Sotek, Miroslav},
  title     = {Director-AI: Real-time LLM Hallucination Guardrail},
  year      = {2026},
  url       = {https://github.com/anulum/director-ai},
  version   = {3.15.1},
  license   = {AGPL-3.0-or-later}
}
```

## License

Dual-licensed:

1. **Open-Source**: [GNU AGPL v3.0](LICENSE) — research, personal use, open-source projects.
2. **Commercial**: [Proprietary license](https://www.anulum.li/licensing) — removes copyleft for closed-source and SaaS.

Contact: [anulum.li](https://www.anulum.li) | [director.class.ai@anulum.li](mailto:director.class.ai@anulum.li)

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md). By contributing, you agree to AGPL v3 terms.

---

<p align="center">
  <a href="https://www.anulum.li">
    <img src="docs/assets/anulum_logo_company.jpg" width="180" alt="ANULUM">
  </a>
  &nbsp;&nbsp;&nbsp;&nbsp;
  <a href="https://www.anulum.li">
    <img src="docs/assets/fortis_studio_logo.jpg" width="180" alt="Fortis Studio">
  </a>
  <br>
  <em>Developed by <a href="https://www.anulum.li">ANULUM Institute</a> / Fortis Studio — Marbach SG, Switzerland</em>
</p>
