Metadata-Version: 2.4
Name: themis-eval
Version: 4.0.2
Summary: Evaluation scaffold for LLM research, benchmarking, and reproducible experiment runs.
Author: Pittawat Taveekitworachai
License-Expression: MIT
Project-URL: Homepage, https://github.com/Pittawat2542/themis
Project-URL: Repository, https://github.com/Pittawat2542/themis
Project-URL: Documentation, https://github.com/Pittawat2542/themis/tree/main/docs
Project-URL: Changelog, https://github.com/Pittawat2542/themis/blob/main/CHANGELOG.md
Project-URL: Issues, https://github.com/Pittawat2542/themis/issues
Keywords: llm,evaluation,benchmark,research,ai
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Testing
Classifier: Typing :: Typed
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: cyclopts>=3.24.0
Requires-Dist: omegaconf>=2.3.0
Requires-Dist: pydantic>=2.12.5
Provides-Extra: dev
Requires-Dist: pytest>=9.0.2; extra == "dev"
Requires-Dist: pytest-asyncio>=1.3.0; extra == "dev"
Requires-Dist: pytest-cov>=7.0.0; extra == "dev"
Requires-Dist: ruff>=0.15.8; extra == "dev"
Requires-Dist: mypy>=1.19.1; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs>=1.6.1; extra == "docs"
Requires-Dist: mkdocs-material>=9.6.14; extra == "docs"
Requires-Dist: mkdocstrings[python]>=0.30.1; extra == "docs"
Requires-Dist: pymdown-extensions>=10.15; extra == "docs"
Provides-Extra: openai
Requires-Dist: openai>=2.24.0; extra == "openai"
Provides-Extra: vllm
Requires-Dist: vllm>=0.18.0; platform_system == "Linux" and extra == "vllm"
Provides-Extra: langgraph
Requires-Dist: langgraph>=1.1.3; extra == "langgraph"
Provides-Extra: mongodb
Requires-Dist: pymongo>=4.10.0; extra == "mongodb"
Provides-Extra: postgres
Requires-Dist: psycopg>=3.2.0; extra == "postgres"
Provides-Extra: datasets
Requires-Dist: datasets>=3.0.0; extra == "datasets"
Requires-Dist: Pillow>=11.0.0; extra == "datasets"
Dynamic: license-file

# Themis

Themis is a Python package for running reproducible LLM evaluations. It gives you a typed scaffold for defining datasets, generators, parsers, metrics, judge workflows, and persistent run artifacts without forcing you into one provider or benchmark.

The published package name is `themis-eval`. The Python import namespace and CLI command are both `themis`.

## Install

```bash
uv add themis-eval
```

Optional extras:

- `uv add "themis-eval[openai]"`
- `uv add "themis-eval[vllm]"` on Linux
- `uv add "themis-eval[langgraph]"`
- `uv add "themis-eval[datasets]"`
- `uv add "themis-eval[mongodb]"`
- `uv add "themis-eval[postgres]"`
- `uv sync --extra docs` for local documentation builds from a repo checkout

## Quick Start

```python
from themis import evaluate
from themis.core.models import Case, Dataset

result = evaluate(
    model="builtin/demo_generator",
    data=[
        Dataset(
            dataset_id="sample",
            cases=[
                Case(
                    case_id="case-1",
                    input={"question": "2+2"},
                    expected_output={"answer": "4"},
                )
            ],
        )
    ],
    metric="builtin/exact_match",
    parser="builtin/json_identity",
)

print(result.run_id, result.status.value)
```

## Custom Extensions

Themis is designed to be extended. You can plug in custom generators, parsers, reducers, metrics, judge models, and store backends through the Python API or config-driven workflows.

- Start with [`Experiment(...)`](docs/tutorials/first-experiment.md) when you want a reusable compiled evaluation definition.
- Start with [`evaluate(...)`](docs/tutorials/first-evaluate.md) when you want the shortest path from inline data to a completed run.
- Use [`docs/how-to/author-custom-components.md`](docs/how-to/author-custom-components.md) for custom component authoring.

## CLI

After installation, the package exposes the `themis` CLI:

```bash
themis quick-eval inline \
  --model builtin/demo_generator \
  --metric builtin/exact_match \
  --parser builtin/json_identity \
  --input '{"question":"2+2"}' \
  --expected-output '{"answer":"4"}'
```

## Documentation

- Start here: [`docs/index.md`](docs/index.md)
- Installation guide: [`docs/start-here/installation.md`](docs/start-here/installation.md)
- API layer chooser: [`docs/start-here/choose-your-api-layer.md`](docs/start-here/choose-your-api-layer.md)
- Python API reference: [`docs/reference/python-api.md`](docs/reference/python-api.md)
- Extension boundaries: [`docs/explanation/extension-boundaries.md`](docs/explanation/extension-boundaries.md)

Build the docs locally with:

```bash
uv sync --extra docs
uv run mkdocs build --strict
```

## Contributing

Contributor setup and release guidance live in [`CONTRIBUTING.md`](CONTRIBUTING.md).
