Metadata-Version: 2.4
Name: pytest-llm
Version: 0.1.0
Summary: pytest-llm: A pytest plugin for testing LLM outputs with success rate thresholds.
Author-email: Johannes Maron <johannes@maron.family>
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Classifier: Development Status :: 3 - Alpha
Classifier: Framework :: Pytest
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: BSD License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Software Development :: Testing
License-File: LICENSE
Requires-Dist: pytest>=7.0.0
Requires-Dist: pre-commit ; extra == "lint"
Requires-Dist: ruff ; extra == "lint"
Requires-Dist: pytest-cov ; extra == "test"
Requires-Dist: httpx ; extra == "test"
Requires-Dist: ollama ; extra == "test"
Project-URL: Homepage, https://github.com/codingjoe/pytest-llm
Project-URL: Repository, https://github.com/codingjoe/pytest-llm
Provides-Extra: lint
Provides-Extra: test

# pytest-llm

Fast AI reliability test suite

A pytest plugin that provides a custom marker for testing LLM (Large Language Model) outputs with configurable success rate thresholds.

## Usage

```python
import pytest

@pytest.mark.llm("How many R's are in the Word 'Strawberry'?", 0.9)
def test_counting(prompt, llm):
    result = llm(prompt).lower()
    assert ("3" in result) or ("three" in result)
```

## Setup

```bash
python3 -m pip install -e pytest-llm
```

```python
# conftest.py
import os
import typing

import httpx
import ollama
import pytest


def github_models_complete(prompt, model=None, system=None) -> str:
    messages = [{"role": "system", "content": system}] if system else []
    messages += [{"role": "user", "content": prompt}]
    if GITHUB_TOKEN := os.getenv("GITHUB_TOKEN"):
        response = httpx.post(
            "https://models.github.ai/inference/chat/completions",
            headers={
                "Content-Type": "application/json",
                "Accept": "application/vnd.github+json",
                "Authorization": f"Bearer {GITHUB_TOKEN}",
                "X-GitHub-Api-Version": "2022-11-28",
            },
            json={
                "model": model or "openai/gpt-5-nano",
                "messages": messages,
            },
        )
        response.raise_for_status()
        return response.json()["choices"][0]["message"]["content"]
    else:
        return ollama.generate(model="llama3.2", prompt=prompt).response


@pytest.fixture
def llm() -> typing.Callable[[str], str]:
    # This fixture isn't needed but might be convenient
    return github_models_complete


def pytest_llm_complete(config):
    # This pytest hook will provide enable the plugin to generate random prompts
    return github_models_complete
```

## Running Tests

```bash
pytest -m llm
# or to run non-llm tests
pytest -m "not llm"
```

