Metadata-Version: 2.4
Name: ai-citer
Version: 1.0.0
Summary: AI-powered fact extraction and citation mapping for documents (PDF, Word, web, text)
Project-URL: Homepage, https://github.com/czawora/ai-citer
Project-URL: Repository, https://github.com/czawora/ai-citer
Project-URL: Issues, https://github.com/czawora/ai-citer/issues
Author: czawora
License: MIT
Keywords: ai,anthropic,citations,fact-extraction,fastapi,pdf,rag
Classifier: Development Status :: 4 - Beta
Classifier: Framework :: FastAPI
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Text Processing
Requires-Python: >=3.11
Requires-Dist: anthropic>=0.39.0
Requires-Dist: asyncpg>=0.30.0
Requires-Dist: beautifulsoup4>=4.12.0
Requires-Dist: fastapi>=0.115.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: mammoth>=1.8.0
Requires-Dist: mcp>=1.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pymupdf>=1.24.0
Requires-Dist: python-dotenv>=1.0.0
Requires-Dist: python-multipart>=0.0.12
Requires-Dist: uvicorn[standard]>=0.32.0
Provides-Extra: dev
Requires-Dist: build>=1.0.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24.0; extra == 'dev'
Requires-Dist: pytest>=8.3.0; extra == 'dev'
Requires-Dist: python-docx>=1.1.2; extra == 'dev'
Requires-Dist: twine>=5.0.0; extra == 'dev'
Description-Content-Type: text/markdown

# ai-citer

AI-powered fact extraction and citation mapping for documents — PDF, Word, web pages, and plain text.

Built on FastAPI + Anthropic Claude. Extracts verbatim-quoted facts from documents, maps each quote back to its exact character offset, and optionally assigns PDF page numbers.

## Install

```bash
pip install ai-citer
```

Requires Python 3.11+ and a PostgreSQL database.

## Quick start

### Run as a standalone server

Set environment variables (or create a `.env` file):

```bash
ANTHROPIC_API_KEY=sk-ant-...
DATABASE_URL=postgresql://user:pass@localhost/ai_citer
```

```bash
ai-citer serve          # starts on :3001
ai-citer serve --port 8080 --reload
```

Or with uvicorn directly:

```bash
uvicorn app.main:app --port 3001
```

### Embed the router in your own FastAPI app

```python
from fastapi import FastAPI
from ai_citer import documents_router

app = FastAPI()
app.include_router(documents_router, prefix="/ai-citer")
```

> **Note:** the router reads `app.state.pool` (asyncpg pool) and `app.state.anthropic_client`
> from the FastAPI app state. Use the lifespan from `app.main` as a reference, or set them up yourself.

### Use the core functions directly

```python
import anthropic
import asyncio
from ai_citer import (
    create_pool, init_db,
    extract_facts, map_citations, assign_page_numbers,
    parse_pdf, parse_word, parse_web, parse_text,
)

async def main():
    pool = await create_pool("postgresql://localhost/mydb")
    await init_db(pool)

    client = anthropic.AsyncAnthropic(api_key="sk-ant-...")

    # Parse a PDF
    with open("report.pdf", "rb") as f:
        content = parse_pdf(f.read())

    # Extract facts
    extraction, usage = await extract_facts(client, content.rawText)

    # Map quotes back to character offsets
    facts = map_citations(content.rawText, extraction.facts)
    print(facts[0].citations[0].charOffset)   # exact position in raw text
    print(f"Cost: ${usage.costUsd:.4f}")

asyncio.run(main())
```

## REST API

When running as a server, the following endpoints are available under `/api/documents`:

| Method | Path | Description |
|--------|------|-------------|
| `GET` | `/` | List all documents |
| `POST` | `/` | Upload a file (`multipart/form-data`) or URL (`url` form field) |
| `GET` | `/:id` | Get a document (includes `pdfData` for PDFs) |
| `POST` | `/:id/extract` | Run fact extraction (optional `{ "prompt": "..." }` body) |
| `GET` | `/:id/facts` | Get all accumulated facts for a document |
| `POST` | `/:id/chat` | Chat with a document (`{ "message": "...", "history": [] }`) |

## MCP server

ai-citer ships an [MCP](https://modelcontextprotocol.io) server that exposes extraction tools to AI assistants (Claude Desktop, etc.):

```bash
ai-citer mcp
```

Tools: `upload_document_url`, `extract_facts`, `get_facts`, `list_documents`.

## Environment variables

| Variable | Required | Default | Description |
|----------|----------|---------|-------------|
| `ANTHROPIC_API_KEY` | Yes | — | Anthropic API key |
| `DATABASE_URL` | Yes | — | PostgreSQL connection string |

## Development

```bash
git clone https://github.com/czawora/ai-citer
cd ai-citer/server
pip install -e ".[dev]"
pytest
```
