Metadata-Version: 2.4
Name: zotlib
Version: 0.4.3
Summary: Extract and format bibliographic data from Zotero databases
Requires-Python: >=3.10
Requires-Dist: pillow<13.0.0,>=12.1.0
Requires-Dist: polars>=1.39.2
Requires-Dist: pymupdf<2.0.0,>=1.26.7
Requires-Dist: rich>=13.0.0
Requires-Dist: typer>=0.12.0
Description-Content-Type: text/markdown

# zotlib

Tools for extracting and formatting bibliographic data from Zotero 8 databases.

Reads directly from Zotero's local SQLite database — no API key needed. Export collections as CSV or APA-formatted references, generate PDF cover images with thumbnails, and extract annotated PDFs with baked-in highlights and markdown notes. Includes a CLI for common workflows and a Python API for custom pipelines. Built for Zotero 8; older versions are untested and likely incompatible due to schema differences.

## Project Structure

```
zotlib/
├── zotlib/                      # Python library
│   ├── cli.py                   # CLI commands
│   ├── config.py                # Database path discovery
│   ├── database.py              # SQLite interface
│   ├── extractors.py            # Data extraction functions
│   ├── exporters.py             # Collection export (annotations + PDFs)
│   ├── backup.py                # Zotero directory backup
│   ├── tables.py                # Zotero database table definitions
│   ├── covers.py                # PDF cover generation
│   ├── paths.py                 # Path resolution and filename utilities
│   └── formatters/apa.py        # APA citation formatter
├── scripts/                     # Utility scripts
│   ├── extract-annotations.js   # Annotation extractor (interactive + headless)
│   ├── create-parent-item.js    # Create parents for standalone PDFs
│   └── run-extract.sh           # Shell wrapper for headless extraction
├── tests/                       # Test suite
└── pyproject.toml               # Project configuration
```

## Installation

```bash
uv add zotlib
```

From source:

```bash
git clone https://github.com/gitronald/zotlib.git
cd zotlib
uv sync
```

From a specific branch:

```bash
uv add git+https://github.com/gitronald/zotlib.git@dev
```

## Configuration

Run `zotlib init` to auto-discover Zotero paths and save them to `zotlib.toml`:

```bash
zotlib init
```

```
database: /mnt/c/Users/rer/Zotero/zotero.sqlite
pdfs_dir: /mnt/i/My Drive/zotero-pdfs

Saved to zotlib.toml
```

The config file stores the database and linked PDFs directory:

```toml
[zotlib]
database = "/path/to/zotero.sqlite"
pdfs_dir = "/path/to/linked-pdfs"
```

Path resolution priority (for both database and PDFs dir):

1. **CLI flag**: `--database`, `--pdfs-dir`
2. **Environment variable**: `ZOTERO_DATABASE`
3. **Config file**: `zotlib.toml`
4. **Auto-discovery**: Checks common locations (Linux, WSL, macOS)

## CLI Commands

### Explore

Browse collections and inspect database schema. The `show-tables` command documents Zotero's largely undocumented SQLite table structure, including column descriptions and types.

```bash
# List available collections
zotlib show-collections

# Show database tables
zotlib show-tables
zotlib show-tables items
```

### Export

Export collection data in multiple formats. Supports linked attachments via `--pdfs-dir` for PDFs stored outside Zotero's default storage.

```bash
# Export all tables as CSV
zotlib export-csv

# Export a collection as CSV
zotlib export-csv -c publications

# Format a collection as APA references
zotlib export-apa -c publications

# Generate cover images and thumbnails
zotlib export-covers -c publications

# Export annotated PDFs and markdown notes
zotlib export-annotations -c mycollection
```

### Backup

Archive the entire Zotero data directory as a compressed `.tar.bz2` file with a progress bar. Saves to `data/backups/zotero-YYYY-MM-DD.tar.bz2` by default. Use `-o` to specify a custom output path or `-d` to point to a different database.

```bash
zotlib backup
zotlib backup -o ~/backups/zotero-2026-03-21.tar.bz2
```

### Output structure

```
output/
├── export-csv/                     # Bibliographic metadata
│   └── publications.csv
├── export-apa/                     # APA-formatted references
│   └── publications.md
├── export-covers/                  # PDF cover images
│   └── publications/
│       ├── fullsize/
│       └── thumbnails/
└── export-annotations/             # Annotated PDFs + notes
    └── mycollection/
        └── author-year-title/
            ├── paper.pdf
            └── annotations.md
```

### Export annotations features

- Multi-attachment support: each PDF gets only its own annotations
- Standalone attachment support: PDFs added directly to a collection
- Linked attachment resolution via `--pdfs-dir`
- "REVIEW: " prefix stripping from titles

### Python API

```python
from zotlib import ZoteroDatabase, extract_cv_items, format_cv_as_apa

db = ZoteroDatabase("/path/to/zotero.sqlite")
items = extract_cv_items(db, collection_name="mypapers")
apa_output = format_cv_as_apa(items, output_path="output/apa.md")
```

## Zotero JavaScript Scripts

**WIP** — Utilities for Zotero's JavaScript console (Tools > Developer > Run JavaScript). The Zotero SQLite database should never be modified directly via Python — use these JS scripts (which run through Zotero's API) for any write operations.

### create-parent-item.js

Creates parent document items for standalone PDF attachments in a collection. Useful when PDFs were added directly without metadata — creates a parent item using the filename as the title and re-parents the attachment.

### extract-annotations.js

Extracts annotations from the selected item's PDFs as markdown. Auto-detects its context:

- **Interactive** (Tools > Developer > Run JavaScript): shows a file save dialog
- **Headless** (via HTTP debug API): writes to `~/Desktop/zotero-annotations/`

To run headlessly:

```bash
./scripts/run-extract.sh
```

Requires: Settings > Advanced > "Allow other applications to communicate with Zotero"

The shell script should work on macOS where Zotero and the terminal share the same `localhost`. On WSL, the script calls Zotero's debug HTTP endpoint on `127.0.0.1:23119`, but `localhost` does not bridge to the Windows host by default. You may need to use the Windows host IP or run the curl command from PowerShell instead.

## Annotation Format

### Types

| Type | Extracted Data |
|------|----------------|
| Highlight | Text + comment + color |
| Note | Comment text |
| Underline | Text + comment |
| Image | Comment only (image not exported) |

### Color Labels

| Hex Code | Label |
|----------|-------|
| `#ffd400` | yellow |
| `#ff6666` | red |
| `#5fb236` | green |
| `#2ea8e5` | blue |
| `#a28ae5` | purple |
| `#e56eee` | magenta |
| `#f19837` | orange |
