Metadata-Version: 2.4
Name: bibtex-updater
Version: 0.8.0
Summary: Replace preprint BibTeX entries with published versions and validate bibliography references
Project-URL: Homepage, https://github.com/rpatrik96/bibtexupdater
Project-URL: Documentation, https://github.com/rpatrik96/bibtexupdater#readme
Project-URL: Repository, https://github.com/rpatrik96/bibtexupdater.git
Project-URL: Issues, https://github.com/rpatrik96/bibtexupdater/issues
Project-URL: Changelog, https://github.com/rpatrik96/bibtexupdater/blob/main/CHANGELOG.md
Author: Patrik Reizinger
License: MIT
License-File: LICENSE
Keywords: academic,arxiv,bibliography,bibtex,citation,crossref,latex,preprint,research
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Text Processing :: Markup :: LaTeX
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: bibtexparser>=1.4.0
Requires-Dist: crossref-commons>=0.0.7
Requires-Dist: httpx>=0.24.0
Requires-Dist: rapidfuzz>=3.0.0
Requires-Dist: requests>=2.28.0
Provides-Extra: all
Requires-Dist: pyyaml>=6.0; extra == 'all'
Requires-Dist: pyzotero>=1.5.0; extra == 'all'
Requires-Dist: scholarly>=1.7.0; extra == 'all'
Requires-Dist: sentence-transformers>=2.2.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: black>=24.0.0; extra == 'dev'
Requires-Dist: build>=1.0.0; extra == 'dev'
Requires-Dist: mypy>=1.13.0; extra == 'dev'
Requires-Dist: pre-commit>=3.6.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.7.0; extra == 'dev'
Requires-Dist: twine>=5.0.0; extra == 'dev'
Requires-Dist: types-requests>=2.31.0; extra == 'dev'
Provides-Extra: organizer
Requires-Dist: pyyaml>=6.0; extra == 'organizer'
Requires-Dist: pyzotero>=1.5.0; extra == 'organizer'
Provides-Extra: organizer-claude
Requires-Dist: pyyaml>=6.0; extra == 'organizer-claude'
Requires-Dist: pyzotero>=1.5.0; extra == 'organizer-claude'
Provides-Extra: organizer-embedding
Requires-Dist: pyyaml>=6.0; extra == 'organizer-embedding'
Requires-Dist: pyzotero>=1.5.0; extra == 'organizer-embedding'
Requires-Dist: sentence-transformers>=2.2.0; extra == 'organizer-embedding'
Provides-Extra: organizer-openai
Requires-Dist: pyyaml>=6.0; extra == 'organizer-openai'
Requires-Dist: pyzotero>=1.5.0; extra == 'organizer-openai'
Provides-Extra: scholarly
Requires-Dist: scholarly>=1.7.0; extra == 'scholarly'
Provides-Extra: zotero
Requires-Dist: pyzotero>=1.5.0; extra == 'zotero'
Description-Content-Type: text/markdown

# BibTeX Updater

Tools for managing BibTeX bibliographies: automatically update preprints to published versions, validate references against external databases, and filter to only cited references.

![9-stage resolution pipeline](assets/pipeline.gif)

## Installation

### From PyPI (Recommended)

```bash
pip install bibtex-updater

# With Google Scholar support
pip install bibtex-updater[scholarly]

# With Zotero support
pip install bibtex-updater[zotero]

# All optional dependencies
pip install bibtex-updater[all]
```

### From Source (Recommended)

```bash
git clone https://github.com/rpatrik96/bibtexupdater.git
cd bibtexupdater
uv sync --extra dev --extra all
```

### Using uv (No Installation)

Run directly without cloning using [uv](https://docs.astral.sh/uv/):

```bash
# Run any command directly
uv run --with "bibtex-updater[all]" bibtex-update references.bib -o updated.bib

# Or use the provided wrapper script
./scripts/bibtex-x update references.bib -o updated.bib
./scripts/bibtex-x check references.bib
./scripts/bibtex-x filter paper.tex -b references.bib -o filtered.bib
```

## CLI Commands

| Command | Description |
|---------|-------------|
| `bibtex-update` | Replace preprints with published versions |
| `bibtex-check` | Validate references exist with correct metadata |
| `bibtex-filter` | Filter to only cited entries |
| `bibtex-zotero` | Update preprints in Zotero library |
| `bibtex-zotero-organize` | Organize Zotero items into collections by research taxonomy |
| `bibtex-obsidian-keywords` | AI-powered keyword generation for Obsidian paper notes |

## Quick Start

### Update Preprints

```bash
# Update preprints to published versions
bibtex-update references.bib -o updated.bib

# Preview changes (dry run)
bibtex-update references.bib --dry-run --verbose
```

### Validate References (Fact-Check)

```bash
# Check if references exist and have correct metadata
bibtex-check references.bib --report report.json

# Strict mode: exit with error if hallucinated/not-found entries
bibtex-check references.bib --strict
```

### Filter Bibliography

```bash
# Filter to only cited entries
bibtex-filter paper.tex -b references.bib -o filtered.bib

# Multiple tex files
bibtex-filter *.tex -b references.bib -o filtered.bib
```

### Update Zotero Library

```bash
# Set credentials (get from zotero.org/settings/keys)
export ZOTERO_LIBRARY_ID="your_user_id"
export ZOTERO_API_KEY="your_api_key"

# Preview changes
bibtex-zotero --dry-run

# Apply updates
bibtex-zotero
```

### Sync BibTeX Updates to Zotero

When updating a `.bib` file, you can simultaneously update matching entries in your Zotero library:

```bash
# Set Zotero credentials
export ZOTERO_LIBRARY_ID="your_user_id"
export ZOTERO_API_KEY="your_api_key"

# Update bib file AND sync to Zotero
bibtex-update references.bib -o updated.bib --zotero

# Preview Zotero changes only (bib changes still apply)
bibtex-update references.bib -o updated.bib --zotero --zotero-dry-run

# Limit to a specific Zotero collection
bibtex-update references.bib -o updated.bib --zotero --zotero-collection ABCD1234
```

The sync matches bib entries to Zotero items by:
1. **arXiv ID** - Most reliable for preprints
2. **DOI** - For preprints with DOIs (e.g., bioRxiv)
3. **Title + Author** - Fuzzy matching as fallback

## Standalone Scripts

For environments without pip (e.g., Overleaf), `filter_bibliography.py` can be used directly as it has no dependencies:

```bash
# Copy the script and run directly
python filter_bibliography.py paper.tex -b references.bib -o filtered.bib
```

## Documentation

| Document | Description |
|----------|-------------|
| [docs/BIBTEX_UPDATER.md](docs/BIBTEX_UPDATER.md) | Full BibTeX updater documentation |
| [docs/REFERENCE_FACT_CHECKER.md](docs/REFERENCE_FACT_CHECKER.md) | Full reference fact-checker documentation |
| [docs/ZOTERO_UPDATER.md](docs/ZOTERO_UPDATER.md) | Full Zotero updater documentation |
| [docs/FILTER_BIBLIOGRAPHY.md](docs/FILTER_BIBLIOGRAPHY.md) | Full filter documentation |
| [docs/LANDSCAPE.md](docs/LANDSCAPE.md) | Databases, competing tools, and ecosystem landscape |
| [examples/](examples/) | Example workflows and configuration files |

## Overleaf Integration

Both tools integrate with Overleaf via GitHub Actions or latexmkrc.

### GitHub Actions (Recommended)

1. Enable GitHub sync in Overleaf (Menu -> Sync -> GitHub)
2. Copy a workflow from [examples/workflows/](examples/workflows/) to `.github/workflows/`
3. Changes synced from Overleaf automatically trigger updates

### latexmkrc (Direct Overleaf)

For `filter_bibliography.py` only (no dependencies required):

1. Upload `filter_bibliography.py` to your Overleaf project
2. Create `.latexmkrc` based on [examples/latexmkrc](examples/latexmkrc)
3. Recompile - filtered bibliography appears in your file list

## Features

### BibTeX Updater (`bibtex-update`)

![Preprint to published](assets/before-after.gif)

- **Multi-source resolution**: arXiv, OpenAlex, Europe PMC, Crossref, DBLP, ACL Anthology, Semantic Scholar, Google Scholar
- **High accuracy**: Title and author fuzzy matching with confidence thresholds
- **ACL Anthology support**: Zero-overhead resolution for NLP papers (ACL, EMNLP, NAACL, etc.)
- **Batch processing**: Multiple files with concurrent workers (default: 8)
- **Deduplication**: Merge duplicates by DOI or normalized title+authors
- **Smart caching**: On-disk cache + semantic resolution cache with TTL
- **Per-service rate limiting**: Optimized rate limits per API (Crossref, S2, DBLP, ACL Anthology, arXiv, OpenAlex, Europe PMC)
- **Batch API support**: Faster bulk lookups via arXiv/S2/Crossref batch endpoints
- **Resolution tracking**: `--mark-resolved` tags updated entries to skip on re-runs

### Zotero Updater (`bibtex-zotero`)

![Zotero integration](assets/zotero-sync.gif)

- **Direct Zotero integration**: Fetches and updates items via Zotero API
- **Same resolution pipeline**: Uses the same multi-source resolution
- **Preserves metadata**: Keeps notes, tags, and attachments intact
- **Idempotent**: Already-published papers are automatically skipped
- **Dry-run mode**: Preview changes before applying
- **Tag-based chunking**: Track processing state with `preprint-upgraded`/`preprint-checked`/`preprint-error` tags

### Zotero Organizer (`bibtex-zotero-organize`)

- **AI-powered taxonomy**: Organize items into hierarchical collections automatically
- **Multiple backends**: Claude, OpenAI, or local embeddings for classification
- **Caching**: Classification results cached to reduce API calls
- **Batch processing**: Configurable limits and dry-run mode

### Obsidian Keywords (`bibtex-obsidian-keywords`)

![AI auto-keywording](assets/obsidian-keywords.gif)

- **AI-powered keywords**: Generate `[[wikilinks]]` for Obsidian paper notes
- **Multiple backends**: Claude, OpenAI, or local embeddings
- **Smart skipping**: `--min-keywords` to skip notes that already have enough keywords
- **Topics file**: Provide existing topics for consistent tagging across notes
- **Dry-run mode**: Preview changes before modifying files

### Reference Fact-Checker (`bibtex-check`)

![Reference fact-checker](assets/fact-checker.gif)

- **Multi-source validation**: Crossref, DBLP, Semantic Scholar
- **Detailed mismatch detection**: Title, author, year, venue comparisons
- **Hallucination detection**: Identifies likely fabricated references
- **Structured reports**: JSON and JSONL output formats
- **CI/CD integration**: Strict mode with exit codes for automation

### Filter Bibliography (`bibtex-filter`)

- **Zero dependencies**: Uses only Python standard library
- **Works on Overleaf**: No pip install needed
- **Multiple bib files**: Merge and filter from multiple sources
- **Citation detection**: Supports natbib, biblatex, and standard LaTeX citations

## Python API

```python
from bibtex_updater import Detector, Resolver, Updater, HttpClient, RateLimiter, DiskCache

# Create HTTP client with rate limiting and caching
rate_limiter = RateLimiter(req_per_min=30)
cache = DiskCache(".cache.json")
http_client = HttpClient(
    timeout=30.0,
    user_agent="bibtex-updater/0.5.0",
    rate_limiter=rate_limiter,
    cache=cache
)

# Detect preprints
detector = Detector()
detection = detector.detect(entry)

if detection.is_preprint:
    # Resolve to published version
    resolver = Resolver(http_client)
    candidate = resolver.resolve(detection)

    if candidate and candidate.confidence >= 0.9:
        # Update the entry
        updater = Updater()
        updated_entry = updater.update_entry(entry, candidate.record, detection)
```

## Development

```bash
# Clone and install in development mode
git clone https://github.com/rpatrik96/bibtexupdater.git
cd bibtexupdater
uv sync --extra dev --extra all

# Run tests
uv run pytest tests/ -v

# Run tests with coverage
uv run pytest tests/ -v --cov=bibtex_updater --cov-report=term-missing

# Code quality
pre-commit run --all-files

# Build package
uv build

# Check package
uv run twine check dist/*
```

## License

MIT License - see [LICENSE](LICENSE) for details.
