Metadata-Version: 2.4
Name: semanticheck
Version: 0.2.0
Summary: pytest-native semantic assertions for LLM and generative AI applications. No servers. No SaaS. Works with OpenAI, Anthropic, LiteLLM and any LLM client.
Project-URL: Homepage, https://github.com/semanticheck/semanticheck
Project-URL: Documentation, https://github.com/semanticheck/semanticheck#readme
Project-URL: Repository, https://github.com/semanticheck/semanticheck
Project-URL: Bug Tracker, https://github.com/semanticheck/semanticheck/issues
Project-URL: Changelog, https://github.com/semanticheck/semanticheck/blob/main/CHANGELOG.md
Author: semanticheck contributors
License: MIT
License-File: LICENSE
Keywords: agent testing,ai quality assurance,ai testing,anthropic,claude testing,genai testing,generative ai,golden baseline,gpt testing,hallucination detection,langchain,llm,llm assertions,llm evaluation,llm quality,llm testing,machine learning testing,openai,prompt testing,pytest,pytest plugin,rag testing,regression testing,semantic assertions,semantic testing
Classifier: Development Status :: 4 - Beta
Classifier: Framework :: Pytest
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Topic :: Software Development :: Testing
Classifier: Typing :: Typed
Requires-Python: >=3.9
Provides-Extra: all
Requires-Dist: jsonschema>=4.0.0; extra == 'all'
Requires-Dist: openai>=1.0.0; extra == 'all'
Requires-Dist: pydantic>=2.0.0; extra == 'all'
Requires-Dist: sentence-transformers>=2.7.0; extra == 'all'
Requires-Dist: tiktoken>=0.5.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: mypy; extra == 'dev'
Requires-Dist: openai>=1.0.0; extra == 'dev'
Requires-Dist: pydantic>=2.0.0; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Requires-Dist: sentence-transformers>=2.7.0; extra == 'dev'
Provides-Extra: jsonschema
Requires-Dist: jsonschema>=4.0.0; extra == 'jsonschema'
Provides-Extra: judge
Requires-Dist: torch>=2.0.0; extra == 'judge'
Requires-Dist: transformers>=4.40.0; extra == 'judge'
Provides-Extra: local
Requires-Dist: sentence-transformers>=2.7.0; extra == 'local'
Provides-Extra: openai
Requires-Dist: openai>=1.0.0; extra == 'openai'
Provides-Extra: pydantic
Requires-Dist: pydantic>=2.0.0; extra == 'pydantic'
Provides-Extra: tiktoken
Requires-Dist: tiktoken>=0.5.0; extra == 'tiktoken'
Description-Content-Type: text/markdown

# semanticheck

**pytest-native semantic assertions for LLM and generative AI applications.**

No servers. No SaaS. No config. Works with OpenAI, Anthropic, LiteLLM, or any LLM client.

[![PyPI](https://img.shields.io/pypi/v/semanticheck)](https://pypi.org/project/semanticheck/)
[![Python](https://img.shields.io/pypi/pyversions/semanticheck)](https://pypi.org/project/semanticheck/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)
[![CI](https://github.com/semanticheck/semanticheck/actions/workflows/ci.yml/badge.svg)](https://github.com/semanticheck/semanticheck/actions)

---

## Installation

```bash
pip install semanticheck
```

With local embeddings (recommended, no API cost):

```bash
pip install "semanticheck[local]"
```

With OpenAI embeddings:

```bash
pip install "semanticheck[openai]"
```

---

## Quick Start

```python
from semanticheck import assert_intent, assert_tone, assert_no_hallucination, assert_no_pii

def test_customer_support_reply():
    response = my_llm("Help me reset my password")

    assert_intent(response, "instructions for resetting a password")
    assert_tone(response, "friendly")
    assert_no_pii(response)
    assert_no_hallucination(response, known_facts=["Password reset links expire after 24 hours"])
```

---

## Assertions

| Function | What it checks |
|---|---|
| `assert_intent(response, expected_intent)` | Semantic match to expected meaning |
| `assert_tone(response, tone)` | Tone: professional, casual, friendly, formal, etc. |
| `assert_no_hallucination(response, known_facts)` | No contradiction of known facts |
| `assert_similar_to(response, reference)` | Cosine similarity to a reference text |
| `assert_token_budget(response, max_tokens)` | Response stays within token limit |
| `assert_schema(response, schema)` | Valid JSON Schema or Pydantic model |
| `assert_language(response, language)` | Written in expected language |
| `assert_no_pii(response)` | No emails, SSNs, credit cards, phones, etc. |
| `assert_readability(response, min_score=60)` | Flesch Reading Ease score |
| `assert_sentiment(response, "positive")` | Sentiment polarity |

---

## Baseline Regression Testing

```python
import pytest
from semanticheck import record_baseline, compare_baseline

@pytest.mark.llm
def test_summarizer_regression(llm_record):
    response = my_summarizer(article)
    if llm_record:
        record_baseline("summarizer_v1", response)
    else:
        compare_baseline("summarizer_v1", response, threshold=0.85)
```

Run with:

```bash
pytest --record-baselines                 # record golden baselines
pytest                                    # compare against them
pytest --baseline-dir ./.baselines/llm    # choose baseline directory (recommended for monorepos)
```

Recommended ergonomic version (auto baseline naming via `nodeid`):

```python
import pytest

@pytest.mark.llm
def test_summarizer_regression(llm_baseline):
    response = my_summarizer(article)
    llm_baseline.check(response, threshold=0.85)
```

---

## LocalJudge

Use a local model as a judge — zero API cost in CI:

```python
from semanticheck import LocalJudge

judge = LocalJudge()  # uses Qwen2.5-0.5B by default
result = judge.evaluate(
    response="Paris is the capital of France.",
    criterion="Correctly answers a geography question about European capitals.",
)
assert result.passed
print(result.score, result.reasoning)
```

---

## pytest Plugin

semanticheck auto-registers as a pytest plugin. Available flags:

```bash
pytest --skip-llm           # skip all @pytest.mark.llm tests
pytest --llm-threshold 0.8  # override similarity threshold globally
pytest --record-baselines   # record golden baselines
```

Available markers:

```python
@pytest.mark.llm        # LLM semantic test
@pytest.mark.llm_slow   # slow test using local judge inference
```

---

## Embedding Backends

| Backend | How to activate | Cost |
|---|---|---|
| sentence-transformers | `pip install semanticheck[local]` | Free |
| OpenAI | Set `OPENAI_API_KEY` | Per token |
| Hash fallback | `SEMANTICHECK_EMBED_BACKEND=fallback` | Free (smoke tests only) |

Override via env vars:

```bash
SEMANTICHECK_EMBED_BACKEND=openai
SEMANTICHECK_EMBED_MODEL=text-embedding-3-large
```

---

## License

MIT
