Metadata-Version: 2.4
Name: chaincheck-llm
Version: 0.1.0
Summary: Validate LLM responses with composable, production-ready checks
Author: JALLAD
License-Expression: MIT
Project-URL: Homepage, https://github.com/ES7/chaincheck-llm
Project-URL: Repository, https://github.com/ES7/chaincheck-llm
Project-URL: Issues, https://github.com/ES7/chaincheck-llm/issues
Keywords: llm,validation,ai,openai,anthropic,gemini,pii,safety,hallucination
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.11
Description-Content-Type: text/markdown
Provides-Extra: openai
Requires-Dist: openai>=1.0; extra == "openai"
Provides-Extra: anthropic
Requires-Dist: anthropic>=0.20; extra == "anthropic"
Provides-Extra: gemini
Requires-Dist: google-genai>=0.5; extra == "gemini"
Provides-Extra: all
Requires-Dist: openai>=1.0; extra == "all"
Requires-Dist: anthropic>=0.20; extra == "all"
Requires-Dist: google-genai>=0.5; extra == "all"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"

# chaincheck-llm

> Validate LLM responses with composable, production-ready checks.

```bash
pip install chaincheck-llm
```

## The Problem

LLMs return unpredictable output. Sometimes they leak PII. Sometimes they refuse to answer. Sometimes they hallucinate. You need a way to validate responses before they reach your users.

`chaincheck-llm` gives you a composable validation pipeline — run any combination of checks on any LLM response.

## Quick Start

```python
from chaincheck import validate
from chaincheck.checks import (
    check_not_empty,
    check_no_pii,
    check_no_toxicity,
    check_min_length,
)

result = validate(
    response=llm_output,
    checks=[
        check_not_empty,
        check_no_pii,
        check_no_toxicity,
        lambda r: check_min_length(r, min_chars=50),
    ]
)

print(result.passed)    # True / False
print(result.score)     # 0.0 - 1.0
print(result.summary())
```

## Available Checks

### Format Checks (free, no LLM needed)

| Check | Description |
|-------|-------------|
| `check_not_empty` | Response is not empty |
| `check_min_length(r, min_chars)` | Response meets minimum length |
| `check_max_length(r, max_chars)` | Response within maximum length |
| `check_json_valid` | Response is valid JSON |
| `check_contains(r, keywords)` | Response contains required keywords |
| `check_not_contains(r, forbidden)` | Response has no forbidden words |
| `check_regex(r, pattern)` | Response matches regex pattern |
| `check_no_placeholder` | No unfilled `{variable}` placeholders |

### Content Checks (free, rule-based)

| Check | Description |
|-------|-------------|
| `check_no_pii` | No email, phone, SSN, credit card |
| `check_no_toxicity` | No toxic/harmful language |
| `check_no_hallucination_markers` | No uncertainty disclaimers |
| `check_no_refusal` | LLM did not refuse to answer |
| `check_language(r, expected)` | Response is in expected language |

### LLM-as-Judge Checks (uses tokens)

| Check | Description |
|-------|-------------|
| `check_relevance(r, question, ...)` | Response is relevant to question |
| `check_factual_consistency(r, context, ...)` | Response consistent with context |
| `check_tone(r, expected_tone, ...)` | Response matches expected tone |

## Composing Checks

```python
from chaincheck import validate
from chaincheck.checks import (
    check_not_empty, check_no_pii, check_no_toxicity,
    check_relevance, check_min_length,
)

result = validate(
    response=llm_output,
    checks=[
        # Fast rule-based checks first
        check_not_empty,
        check_no_pii,
        check_no_toxicity,
        lambda r: check_min_length(r, min_chars=100),
        # LLM judge last (costs tokens)
        lambda r: check_relevance(
            r,
            question="What is machine learning?",
            provider="openai",
            model="gpt-4o-mini",
            api_key="sk-...",
        ),
    ],
    fail_fast=True,  # stop after first failure
)

if not result.passed:
    for check in result.failed_checks:
        print(f"FAILED: {check.name} — {check.message}")
```

## ValidationResult

```python
result.passed           # True if all checks passed
result.score            # Average score (0.0 - 1.0)
result.checks           # List of CheckResult
result.passed_checks    # Checks that passed
result.failed_checks    # Checks that failed
result.summary()        # Dict with full breakdown
```

## License

MIT
