Metadata-Version: 2.4
Name: bibtex-updater
Version: 0.1.0
Summary: Replace preprint BibTeX entries with published versions and validate bibliography references
Project-URL: Homepage, https://github.com/rpatrik96/bibtexupdater
Project-URL: Documentation, https://github.com/rpatrik96/bibtexupdater#readme
Project-URL: Repository, https://github.com/rpatrik96/bibtexupdater.git
Project-URL: Issues, https://github.com/rpatrik96/bibtexupdater/issues
Project-URL: Changelog, https://github.com/rpatrik96/bibtexupdater/blob/main/CHANGELOG.md
Author: Patrik Reizinger
License: MIT
License-File: LICENSE
Keywords: academic,arxiv,bibliography,bibtex,citation,crossref,latex,preprint,research
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Text Processing :: Markup :: LaTeX
Classifier: Typing :: Typed
Requires-Python: >=3.9
Requires-Dist: bibtexparser>=1.4.0
Requires-Dist: crossref-commons>=0.0.7
Requires-Dist: httpx>=0.24.0
Requires-Dist: rapidfuzz>=3.0.0
Requires-Dist: requests>=2.28.0
Provides-Extra: all
Requires-Dist: pyzotero>=1.5.0; extra == 'all'
Requires-Dist: scholarly>=1.7.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: black>=24.0.0; extra == 'dev'
Requires-Dist: build>=1.0.0; extra == 'dev'
Requires-Dist: mypy>=1.13.0; extra == 'dev'
Requires-Dist: pre-commit>=3.6.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.7.0; extra == 'dev'
Requires-Dist: twine>=5.0.0; extra == 'dev'
Requires-Dist: types-requests>=2.31.0; extra == 'dev'
Provides-Extra: scholarly
Requires-Dist: scholarly>=1.7.0; extra == 'scholarly'
Provides-Extra: zotero
Requires-Dist: pyzotero>=1.5.0; extra == 'zotero'
Description-Content-Type: text/markdown

# BibTeX Updater

Tools for managing BibTeX bibliographies: automatically update preprints to published versions, validate references against external databases, and filter to only cited references.

## Installation

### From PyPI (Recommended)

```bash
pip install bibtex-updater

# With Google Scholar support
pip install bibtex-updater[scholarly]

# With Zotero support
pip install bibtex-updater[zotero]

# All optional dependencies
pip install bibtex-updater[all]
```

### From Source

```bash
git clone https://github.com/rpatrik96/bibtexupdater.git
cd bibtexupdater
pip install -e ".[dev]"
```

## CLI Commands

| Command | Description |
|---------|-------------|
| `bibtex-update` | Replace preprints with published versions |
| `bibtex-check` | Validate references exist with correct metadata |
| `bibtex-filter` | Filter to only cited entries |
| `bibtex-zotero` | Update preprints in Zotero library |

## Quick Start

### Update Preprints

```bash
# Update preprints to published versions
bibtex-update references.bib -o updated.bib

# Preview changes (dry run)
bibtex-update references.bib --dry-run --verbose
```

### Validate References (Fact-Check)

```bash
# Check if references exist and have correct metadata
bibtex-check references.bib --report report.json

# Strict mode: exit with error if hallucinated/not-found entries
bibtex-check references.bib --strict
```

### Filter Bibliography

```bash
# Filter to only cited entries
bibtex-filter paper.tex -b references.bib -o filtered.bib

# Multiple tex files
bibtex-filter *.tex -b references.bib -o filtered.bib
```

### Update Zotero Library

```bash
# Set credentials (get from zotero.org/settings/keys)
export ZOTERO_LIBRARY_ID="your_user_id"
export ZOTERO_API_KEY="your_api_key"

# Preview changes
bibtex-zotero --dry-run

# Apply updates
bibtex-zotero
```

## Standalone Scripts

For environments without pip (e.g., Overleaf), `filter_bibliography.py` can be used directly as it has no dependencies:

```bash
# Copy the script and run directly
python filter_bibliography.py paper.tex -b references.bib -o filtered.bib
```

## Documentation

| Document | Description |
|----------|-------------|
| [docs/BIBTEX_UPDATER.md](docs/BIBTEX_UPDATER.md) | Full BibTeX updater documentation |
| [docs/REFERENCE_FACT_CHECKER.md](docs/REFERENCE_FACT_CHECKER.md) | Full reference fact-checker documentation |
| [docs/ZOTERO_UPDATER.md](docs/ZOTERO_UPDATER.md) | Full Zotero updater documentation |
| [docs/FILTER_BIBLIOGRAPHY.md](docs/FILTER_BIBLIOGRAPHY.md) | Full filter documentation |
| [examples/](examples/) | Example workflows and configuration files |

## Overleaf Integration

Both tools integrate with Overleaf via GitHub Actions or latexmkrc.

### GitHub Actions (Recommended)

1. Enable GitHub sync in Overleaf (Menu -> Sync -> GitHub)
2. Copy a workflow from [examples/workflows/](examples/workflows/) to `.github/workflows/`
3. Changes synced from Overleaf automatically trigger updates

### latexmkrc (Direct Overleaf)

For `filter_bibliography.py` only (no dependencies required):

1. Upload `filter_bibliography.py` to your Overleaf project
2. Create `.latexmkrc` based on [examples/latexmkrc](examples/latexmkrc)
3. Recompile - filtered bibliography appears in your file list

## Features

### BibTeX Updater (`bibtex-update`)

- **Multi-source resolution**: arXiv, Crossref, DBLP, Semantic Scholar, Google Scholar
- **High accuracy**: Title and author fuzzy matching with confidence thresholds
- **Batch processing**: Multiple files with concurrent workers
- **Deduplication**: Merge duplicates by DOI or normalized title+authors
- **Caching**: On-disk cache to avoid repeated API calls

### Zotero Updater (`bibtex-zotero`)

- **Direct Zotero integration**: Fetches and updates items via Zotero API
- **Same resolution pipeline**: Uses the same multi-source resolution
- **Preserves metadata**: Keeps notes, tags, and attachments intact
- **Idempotent**: Already-published papers are automatically skipped
- **Dry-run mode**: Preview changes before applying

### Reference Fact-Checker (`bibtex-check`)

- **Multi-source validation**: Crossref, DBLP, Semantic Scholar
- **Detailed mismatch detection**: Title, author, year, venue comparisons
- **Hallucination detection**: Identifies likely fabricated references
- **Structured reports**: JSON and JSONL output formats
- **CI/CD integration**: Strict mode with exit codes for automation

### Filter Bibliography (`bibtex-filter`)

- **Zero dependencies**: Uses only Python standard library
- **Works on Overleaf**: No pip install needed
- **Multiple bib files**: Merge and filter from multiple sources
- **Citation detection**: Supports natbib, biblatex, and standard LaTeX citations

## Python API

```python
from bibtex_updater import Detector, Resolver, Updater, HttpClient, RateLimiter, DiskCache

# Create HTTP client with rate limiting and caching
rate_limiter = RateLimiter(req_per_min=30)
cache = DiskCache(".cache.json")
http_client = HttpClient(
    timeout=30.0,
    user_agent="bibtex-updater/0.1.0",
    rate_limiter=rate_limiter,
    cache=cache
)

# Detect preprints
detector = Detector()
detection = detector.detect(entry)

if detection.is_preprint:
    # Resolve to published version
    resolver = Resolver(http_client)
    candidate = resolver.resolve(detection)

    if candidate and candidate.confidence >= 0.9:
        # Update the entry
        updater = Updater()
        updated_entry = updater.update_entry(entry, candidate.record, detection)
```

## Development

```bash
# Clone and install in development mode
git clone https://github.com/rpatrik96/bibtexupdater.git
cd bibtexupdater
pip install -e ".[dev,all]"

# Run tests
pytest tests/ -v

# Run tests with coverage
pytest tests/ -v --cov=bibtex_updater --cov-report=term-missing

# Code quality
pre-commit run --all-files

# Build package
python -m build

# Check package
twine check dist/*
```

## License

MIT License - see [LICENSE](LICENSE) for details.
