Metadata-Version: 2.4
Name: llm-guard-kit
Version: 0.2.0
Summary: Real-time reliability monitoring, failure diagnosis, and self-repair for LLM agents. AUROC 0.879–0.895.
Author: Avighan Majumder
License: MIT
Project-URL: Repository, https://github.com/avighan/qppg
Project-URL: Issues, https://github.com/avighan/qppg/issues
Keywords: llm,reliability,failure-prediction,anomaly-detection,knn,claude,openai
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.20
Requires-Dist: scikit-learn>=0.24
Requires-Dist: sentence-transformers>=2.2.0
Requires-Dist: anthropic>=0.7.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: matplotlib>=3.4; extra == "dev"
Provides-Extra: qara
Requires-Dist: torch>=1.13; extra == "qara"
Provides-Extra: server
Requires-Dist: fastapi>=0.100; extra == "server"
Requires-Dist: uvicorn[standard]>=0.23; extra == "server"
Requires-Dist: pydantic>=2.0; extra == "server"
Requires-Dist: streamlit>=1.28; extra == "server"
Requires-Dist: pandas>=1.3; extra == "server"

# llm-guard-kit

**Real-time reliability monitoring, failure diagnosis, and self-repair for LLM agents.**

[![PyPI](https://img.shields.io/pypi/v/llm-guard-kit.svg)](https://pypi.org/project/llm-guard-kit/)
[![Python](https://img.shields.io/pypi/pyversions/llm-guard-kit.svg)](https://pypi.org/project/llm-guard-kit/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

---

## What it does

`llm-guard-kit` wraps any ReAct / tool-calling LLM agent with a four-tier reliability stack — no labels required on day one:

| Tier | Component | What it does |
|------|-----------|--------------|
| 0 | `LabelFreeScorer` | Risk score per query in <15 ms. Zero cold-start using behavioral signals. |
| 1 | `QppgMonitor` | Drop-in agent monitor. Auto-calibrates, fires alerts, exports reports. |
| 3 | `FailureTaxonomist` | Diagnoses *why* a chain failed (retrieval failure, excessive search, hallucination, …) |
| 4 | `SelfHealer` | Converts failure diagnosis into prompt injections that repair the agent mid-run. |

**Validated AUROC (HotpotQA multi-hop QA, 200 chains):**

| Condition | Within-domain | Cross-domain |
|-----------|--------------|--------------|
| n=0 chains — behavioral signals only (SC2) | **0.879** | 0.570 |
| n≥5 chains — + GMM density (SC8) | **0.883** | 0.664 |
| n≥50 labeled — + QARA obs-pool adapter | 0.742 | **0.675** |
| + LLM judge (gpt-4o-mini, J2=SC8+judge) | **0.895** | 0.660 |

---

## Install

```bash
pip install llm-guard-kit                    # core (no API key needed)
pip install "llm-guard-kit[qara]"            # + QARA supervised adapter (torch)
pip install "llm-guard-kit[server]"          # + FastAPI HTTP server
```

Requires Python 3.9+. No API key required for zero-label monitoring.

---

## Quick start — zero labels, zero cold-start

```python
from qppg_service import QppgMonitor

monitor = QppgMonitor(threshold=0.65)   # fires above this risk score

# Call after every agent run
alert = monitor.track(
    question    = "Which city is older, Rome or Athens?",
    steps       = agent_steps,           # list of {thought, action_type, action_arg, observation}
    final_answer= agent.final_answer,
    finished    = True,
)

if alert:
    print(f"HIGH RISK ({alert.risk_score:.2f}): {alert.recommendation}")

# Export a full monitoring report
print(monitor.export_report())
monitor.export_csv("agent_risk_log.csv")
```

Works on query 1. No training. No labels. AUROC 0.879 within-domain.

---

## Full pipeline — detect → diagnose → repair

```python
from qppg_service import QppgMonitor, FailureTaxonomist, SelfHealer

monitor = QppgMonitor(threshold=0.65)
tx      = FailureTaxonomist()
healer  = SelfHealer()

alert = monitor.track(question, steps, final_answer, finished=True)

if alert:
    # Diagnose WHY it failed
    failure = tx.classify(question, steps, final_answer, finished=True)
    print(failure.primary_mode)   # "EXCESSIVE_SEARCH" | "RETRIEVAL_FAILURE" | ...
    print(failure.explanation)    # human-readable explanation

    # Get a repair prompt to inject into the agent
    action = healer.suggest(failure, question, steps, final_answer)
    print(action.action_type)       # "FORCE_FINISH" | "REPHRASE_QUERY" | ...
    print(action.prompt_injection)  # ready to inject as next agent message
    print(action.urgency)           # "HIGH" | "MEDIUM" | "LOW"
```

**Failure modes detected:**

| Mode | Trigger | Suggested repair |
|------|---------|-----------------|
| `RETRIEVAL_FAILURE` | mean cosine(obs, question) < 0.35 | `REPHRASE_QUERY` |
| `EXCESSIVE_SEARCH` | > 4 search steps | `CONSOLIDATE` or `FORCE_FINISH` |
| `CONFLICTING_EVIDENCE` | high thought variance + high query diversity | `CONSOLIDATE` |
| `INSUFFICIENT_EVIDENCE` | weak retrieval + ≥ 2 searches | `ADDITIONAL_SEARCH` |
| `ANSWER_UNSUPPORTED` | answer words absent from reasoning | `VERIFY_ANSWER` |
| `PREMATURE_STOP` | ≤ 1 search, no clean finish | `ADDITIONAL_SEARCH` (urgent) |
| `LOW_RISK` | no flags | `NO_ACTION` |

---

## Progressive calibration

As you accumulate agent logs, the scorer automatically improves:

```python
# After 5+ chains (any domain) — activates GMM density estimation
monitor.calibrate(chains)                    # list of {question, steps, final_answer, finished}

# After 50+ labeled chains — activates QARA supervised obs-pool adapter
monitor.calibrate(chains, labeled=True)      # chains must have "correct": True/False

# Check current status and expected AUROC
print(monitor.scorer.status())
```

---

## Retrieval quality diagnostics

A standalone signal that tells you which search steps are failing:

```python
from qppg_service import LabelFreeScorer

scorer = LabelFreeScorer()
rq = scorer.retrieval_quality(question, steps)
# {"mean_sim": 0.41, "min_sim": 0.22, "quality_label": "POOR", "per_step": [...]}
```

Correct agents average `mean_sim = 0.554`; wrong agents `0.458` (Δ+0.096, p<0.01).

---

## HTTP API (for multi-language / microservice deployments)

```bash
pip install "llm-guard-kit[server]"
python -m qppg_service.server --host 0.0.0.0 --port 8080
```

```bash
curl -X POST http://localhost:8080/score \
  -H "Content-Type: application/json" \
  -d '{"question": "...", "steps": [...], "final_answer": "..."}'
```

---

## Legacy API (v0.1.x — still supported)

```python
from llm_guard import LLMGuard

guard = LLMGuard(api_key="sk-ant-...")
guard.fit(correct_questions=["What is the capital of France?", ...])
result = guard.query("What is 15% of 240?")
print(result.risk_score)   # 0.12 (lower = lower failure risk)
```

The original `LLMGuard` class (exp21-23, within-domain AUROC 0.966-1.000 on MATH/HumanEval/TriviaQA) remains fully functional.

---

## Agent step format

```python
steps = [
    {
        "thought":      "I need to find when the Eiffel Tower was built.",
        "action_type":  "Search",
        "action_arg":   "Eiffel Tower construction date",
        "observation":  "The Eiffel Tower was built between 1887 and 1889..."
    },
    {
        "thought":      "I now have the answer.",
        "action_type":  "Finish",
        "action_arg":   "1889",
        "observation":  ""
    }
]
```

---

## Research background

Built on experiments exp18–45 validating against HotpotQA, NaturalQuestions, TriviaQA, and GSM8K:

- **Behavioral signals** (step count, completion, answer gap): AUROC 0.879 — zero calibration
- **GMM density estimation** on chain embeddings: +0.004 within, +0.094 cross-domain
- **QARA supervised adapter** on observation-pool embeddings: best cross-domain (0.675, p=0.025)
- **LLM-as-judge** (gpt-4o-mini, exp41): 0.895 within-domain when combined with SC8

Paper draft: `docs/research_paper.md`

---

## License

MIT
