Metadata-Version: 2.4
Name: estravon-backend
Version: 0.1.6
Summary: Self-hosted PDF extraction backend for the Estravon Zotero plugin
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: python-fasthtml<1.0,>=0.12.0
Requires-Dist: replicate<2.0,>=0.34.0
Requires-Dist: httpx<1.0,>=0.27.0
Requires-Dist: python-multipart<1.0,>=0.0.9
Requires-Dist: python-dotenv<2.0,>=1.0.0
Requires-Dist: pypdf[cryptography]>=4.0
Requires-Dist: mistralai<3.0,>=2.0
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Provides-Extra: nlp
Requires-Dist: spacy>=3.7; extra == "nlp"
Dynamic: license-file

# estravon-backend

Self-hosted PDF extraction backend for the
[Estravon Zotero plugin](https://github.com/tiberavonltd/estravon-plugin).

> **Independent project.**
> Estravon is not affiliated with, endorsed by, or in any way connected to the
> [Zotero project](https://www.zotero.org/) or the Corporation for Digital Scholarship.
> Zotero is a registered trademark of the Corporation for Digital Scholarship.

---

Extracts nominated sections of a book PDF to Markdown and attaches the result
directly to the Zotero item — synced, versioned, always co-located with the source.

**Just want to run it?** Skip this page — follow the step-by-step guide at
[estravon.com/install](https://estravon.com/install) instead. It covers `pip install`,
virtual environments, and `.env` configuration without requiring a clone.

This README is for people who want to read the source, modify the backend,
or run in editable mode.

---

## Developer setup

```bash
git clone https://github.com/tiberavonltd/estravon-backend.git
cd estravon-backend

python -m venv .venv
source .venv/bin/activate   # Windows: .venv\Scripts\activate

pip install -e ".[dev]"
```

Create a `.env` file in the repo root and add your API key:

```
MISTRAL_API_KEY=your_key_here
```

Get a key at [console.mistral.ai](https://console.mistral.ai/) (~$0.002/page).

Start the backend:

```bash
estravon --port 7766
```

Run the test suite:

```bash
pytest
```

---

## Supported extraction backends

| Backend | Pricing | `.env` config |
|---|---|---|
| [Mistral OCR](https://console.mistral.ai/) | ~$0.002/page, pay-as-you-go | `MISTRAL_API_KEY=...` (default) |
| [Datalab](https://www.datalab.to/) | $25/month flat | `DATALAB_API_KEY=...` + `_ZM_BACKEND=datalab` |
| [Replicate](https://replicate.com/) | Pay-as-you-go | `REPLICATE_API_TOKEN=...` + `_ZM_BACKEND=replicate` |

---

## Architecture

```
Zotero plugin  →  POST /process  →  run_extraction()
                                          ↓
                              MistralBackend | DatalabBackend | ReplicateBackend
                                          ↓
                                 result .md + images returned
```

The backend is a single-process [FastHTML](https://fastht.ml) server. One job runs
at a time; the plugin polls `GET /jobs/{id}` until the result is ready.
`GET /status` exposes the current server state (`idle` / `running` / `error`).

---

## Health check

```bash
curl http://localhost:7766/ping
# {"status":"ok","state":"idle","backend":"mistral"}

curl http://localhost:7766/status
# {"state":"idle","state_since_s":4.1,"backend":"mistral","last_job":{}}
```

---

## Links

- [Plugin repository](https://github.com/tiberavonltd/estravon-plugin)
- [End-user install guide](https://estravon.com/install)
- [estravon.com](https://estravon.com)
- [Report an issue](https://github.com/tiberavonltd/estravon-backend/issues)

---

## License

[AGPL-3.0](https://www.gnu.org/licenses/agpl-3.0.html) — the same license as Zotero itself.
