Metadata-Version: 2.4
Name: upspawn-ocr-cli
Version: 0.1.0b4
Summary: Modern CLI to extract text from PDFs using Mistral cloud or local Ollama models (glm-ocr, deepseek-ocr, LightOnOCR-2).
Author-email: UpSpawn <opensource@upspawn.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/upspawn/mistral-ocr-cli
Project-URL: Repository, https://github.com/upspawn/mistral-ocr-cli.git
Project-URL: Issues, https://github.com/upspawn/mistral-ocr-cli/issues
Project-URL: Release Notes, https://github.com/upspawn/mistral-ocr-cli/releases
Keywords: ocr,mistral,pdf,cli,text-extraction
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Information Technology
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: mistralai>=2.0.0
Requires-Dist: click>=8.1.0
Requires-Dist: rich>=13.0.0
Requires-Dist: python-dotenv>=1.0.0
Provides-Extra: local
Requires-Dist: ollama>=0.4.0; extra == "local"
Requires-Dist: pdf2image>=1.16.0; extra == "local"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: mypy>=1.8.0; extra == "dev"
Requires-Dist: ruff>=0.4.0; extra == "dev"
Requires-Dist: pre-commit>=3.6.0; extra == "dev"
Requires-Dist: black>=24.0.0; extra == "dev"
Dynamic: license-file

# Mistral OCR CLI

[![CI](https://github.com/upspawn/mistral-ocr-cli/actions/workflows/ci.yml/badge.svg)](https://github.com/upspawn/mistral-ocr-cli/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/mistral-ocr-cli.svg)](https://pypi.org/project/mistral-ocr-cli/)
[![Python](https://img.shields.io/pypi/pyversions/mistral-ocr-cli.svg)](https://pypi.org/project/mistral-ocr-cli/)
[![License](https://img.shields.io/pypi/l/mistral-ocr-cli.svg)](LICENSE)
[![pre-commit](https://img.shields.io/badge/pre--commit-enabled-brightgreen?logo=pre-commit)](https://github.com/pre-commit/pre-commit)
[![Code style: black](https://img.shields.io/badge/code%20style-black-000000.svg)](https://github.com/psf/black)

Modern, polished CLI to extract text from PDFs using the Mistral OCR API.

## Features

- Elegant TUI with progress bars and rich output
- Single file or batch processing
- Output in text, JSON, or Markdown
- Parallel batch processing with `--jobs`
- Config helper and `.env` support

## Quickstart

1) Install

```bash
uv tool install mistral-ocr-cli  # via pipx-like tool install
# or
uv pip install mistral-ocr-cli   # into current environment
```

2) Configure API key

```bash
export MISTRAL_API_KEY=your_key_here
# or
echo "MISTRAL_API_KEY=your_key_here" >> .env
```

3) Extract text

```bash
ocr extract file.pdf -o out.txt
ocr extract file1.pdf file2.pdf --batch --output-dir outputs --jobs 4
```

## Usage

```bash
ocr extract [OPTIONS] FILES...

Options:
  -o, --output PATH            Output file (single-file mode)
  -f, --format [text|json|markdown]
  -b, --batch                  Enable batch mode
  -O, --output-dir PATH        Directory for batch outputs
  -j, --jobs INTEGER RANGE     Parallel jobs for batch [default: 1]
  -v, --verbose                Verbose logs
  -q, --quiet                  Only errors
  --version                    Show version
  --help                       Show help
```

## Programmatic use

```python
from ocr.pdf2text import pdf_to_text

text = pdf_to_text("/path/file.pdf")
```

## Development

```bash
uv pip install -e .[dev]
uv run pre-commit install
uv run pytest -q
```

Releasing is handled via standard tags and GitHub Releases.

## License

MIT


## Test coverage

```bash
# Terminal report
make coverage

# HTML report in htmlcov/
make coverhtml
```

