Metadata-Version: 2.4
Name: llm-annotator
Version: 0.7.0
Summary: An easy-to-extend LLM annotator for robust, resumable data annotation.
Project-URL: Homepage, https://github.com/BramVanroy/llm-annotator
Project-URL: Documentation, https://BramVanroy.github.io/llm-annotator
Project-URL: Repository, https://github.com/BramVanroy/llm-annotator
Project-URL: Issues, https://github.com/BramVanroy/llm-annotator/issues
Author-email: Bram Vanroy <2779410+BramVanroy@users.noreply.github.com>
License-Expression: Apache-2.0
License-File: LICENSE
Classifier: Development Status :: 4 - Beta
Classifier: Framework :: MkDocs
Classifier: Framework :: Pytest
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Software Development :: Build Tools
Classifier: Typing :: Typed
Requires-Python: <3.14,>=3.12
Requires-Dist: colorama<1,>=0.4.6
Requires-Dist: datasets<5,>=4.8.5
Provides-Extra: anthropic
Requires-Dist: anthropic<1,>=0.102.0; extra == 'anthropic'
Provides-Extra: gemini
Requires-Dist: google-genai<3,>=2.3.0; extra == 'gemini'
Provides-Extra: openai
Requires-Dist: openai<3,>=2.36.0; extra == 'openai'
Provides-Extra: vllm
Requires-Dist: mistral-common<2,>=1.11.2; extra == 'vllm'
Requires-Dist: vllm==0.21.0; extra == 'vllm'
Description-Content-Type: text/markdown

# A simple, extensible LLM-based dataset generator and annotator

[![CI](https://github.com/BramVanroy/llm-annotator/actions/workflows/ci.yml/badge.svg)](https://github.com/BramVanroy/llm-annotator/actions/workflows/ci.yml)
[![codecov](https://codecov.io/gh/BramVanroy/llm-annotator/branch/main/graph/badge.svg)](https://codecov.io/gh/BramVanroy/llm-annotator)
[![PyPI version](https://badge.fury.io/py/llm-annotator.svg)](https://badge.fury.io/py/llm-annotator)
[![Python versions](https://img.shields.io/pypi/pyversions/llm-annotator.svg)](https://pypi.org/project/llm-annotator/)
[![License](https://img.shields.io/github/license/BramVanroy/llm-annotator)](LICENSE)
![GitHub tag](https://img.shields.io/github/v/tag/BramVanroy/llm-annotator)


This repository provides a small, resumable framework for annotating datasets with LLMs (via `vllm`).

## Documentation

📚 **[Read the full documentation](https://bramvanroy.github.io/llm-annotator/)** for detailed guides, API reference, and examples.

## Installation

Recommended:

```sh
uv add llm-annotator
```

or

```sh
pip install llm-annotator
```

Installing flash-infer for your version (eg CUDA12.8)

```sh
uv pip install flashinfer-python flashinfer-cubin
# JIT cache package (replace cu129 with your CUDA version: cu128, cu129, or cu130)
uv pip install flashinfer-jit-cache --index-url https://flashinfer.ai/whl/cu128
```

## Usage

Quick example:

```python
from llm_annotator import Annotator

# Annotate a dataset with sentiment classification
with Annotator(model="meta-llama/Llama-3.2-3B-Instruct", max_model_len=4096) as anno:
    ds = anno.annotate_dataset(
        output_dir="outputs/sentiment",
        full_prompt_template="Classify the sentiment: {text}",
        dataset_name="stanfordnlp/imdb",
        dataset_split="test",
        max_num_samples=100,
    )
```

See the **[documentation](https://bramvanroy.github.io/llm-annotator/)** for more examples, including:
- Structured output with JSON schemas
- Custom validation and postprocessing
- Large-scale streaming annotation
- Generating datasets from scratch
- Multi-GPU support

Or check out the [examples/](examples/) directory for complete working examples.


## Testing

```sh
make test
```

`make test` runs the fast suite and skips tests marked as `slow`.

Additional test targets:

```sh
# Fast tests (same as `make test`)
make test-fast

# Slow tests only
make test-slow

# Integration tests only
make test-integration

# Entire suite (fast + slow)
make test-all
```

You can also run markers directly with pytest:

```sh
uv run pytest -m "not slow"
uv run pytest -m "slow"
uv run pytest -m "integration"
```

Slow and integration tests may load local models, require more runtime, or depend on optional components.

## Building documentation

Build the documentation locally:

```sh
make docs
```

Serve the documentation locally (at http://localhost:8000):

```sh
make docs-serve
```

The documentation is automatically built and deployed to GitHub Pages when changes are pushed to the `main` branch. The pre-commit hook will check that documentation builds successfully before allowing a push if docstrings or documentation files have changed.