Metadata-Version: 2.4
Name: verbatim-core
Version: 0.2.6
Summary: Lightweight verbatim span extraction -- the RAG-agnostic core of verbatim-rag
Project-URL: Homepage, https://github.com/krlabsorg/verbatim-rag
Project-URL: Bug Tracker, https://github.com/krlabsorg/verbatim-rag/issues
Author-email: Adam Kovacs <kovacs@krlabs.eu>
License: MIT
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Requires-Python: >=3.10
Requires-Dist: jinja2>=3.0.0
Requires-Dist: openai>=1.3.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: rapidfuzz>=3.0.0
Provides-Extra: model
Requires-Dist: scikit-learn==1.6.1; extra == 'model'
Requires-Dist: torch>=2.6.0; extra == 'model'
Requires-Dist: transformers==4.53.3; extra == 'model'
Description-Content-Type: text/markdown

# verbatim-core

Lightweight verbatim span extraction -- the RAG-agnostic core of [verbatim-rag](https://github.com/KRLabsOrg/verbatim-rag).

Extract exact, verbatim text spans from documents that answer a question. No vector databases, no embeddings, no heavy ML dependencies -- just `openai` and `pydantic`.

## Installation

```bash
pip install verbatim-core
```

## Quick Start

```python
from verbatim_core import VerbatimTransform

vt = VerbatimTransform()
response = vt.transform(
    question="What is the main finding?",
    context=[
        {"content": "The study found that X leads to Y.", "title": "Paper A"},
        {"content": "Results show Z is statistically significant.", "title": "Paper B"},
    ],
)

print(response.answer)

# Access individual highlights and citations
for doc in response.documents:
    for highlight in doc.highlights:
        print(f"  [{highlight.start}:{highlight.end}] {highlight.text}")
```

## What This Package Includes

- **VerbatimTransform** -- question + context -> cited, grounded answer
- **LLMSpanExtractor** -- extract verbatim spans using an LLM
- **LLMClient** -- unified OpenAI API wrapper (sync + async)
- **TemplateManager** -- response formatting with multiple template strategies
- **@verbatim_enhance** -- decorator to enhance existing RAG functions
- **CLI** (`verbatim-enhance`) -- batch processing from the command line

## Model-Based Extraction

For ModernBERT or Zilliz semantic highlight extractors (adds torch, transformers):

```bash
pip install verbatim-core[model]
```

## Environment

```bash
export OPENAI_API_KEY=your_api_key_here
```

## Full RAG System

For the complete RAG pipeline with vector indexing, embeddings, and document processing, install the full package:

```bash
pip install verbatim-rag
```
