Metadata-Version: 2.4
Name: axm-bib
Version: 0.3.0
Summary: AXM bibliographic tools — DOI resolution, BibTeX retrieval, paper search & PDF download
Project-URL: Homepage, https://github.com/axm-protocols/axm-bib
Project-URL: Repository, https://github.com/axm-protocols/axm-bib
Project-URL: Issues, https://github.com/axm-protocols/axm-bib/issues
Project-URL: Documentation, https://axm-protocols.github.io/axm-bib/
Author-email: Gabriel Jarry <gabriel@axm-protocols.io>
License: Apache-2.0
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Typing :: Typed
Requires-Python: >=3.12
Requires-Dist: axm
Requires-Dist: bibtexparser<2,>=1.4
Requires-Dist: cyclopts>=4.5.1
Requires-Dist: habanero>=1.2
Requires-Dist: httpx>=0.27
Requires-Dist: pydantic>=2.0
Requires-Dist: pymupdf4llm>=0.0.17
Requires-Dist: pymupdf>=1.25
Description-Content-Type: text/markdown

<p align="center">
  <img src="https://raw.githubusercontent.com/axm-protocols/axm-init/main/assets/logo.png" alt="AXM Logo" width="180" />
</p>

<p align="center">
  <strong>axm-bib — Bibliographic tools: search papers, resolve DOIs, download & extract PDFs</strong>
</p>


<p align="center">
  <a href="https://github.com/axm-protocols/axm-bib/actions/workflows/ci.yml"><img src="https://github.com/axm-protocols/axm-bib/actions/workflows/ci.yml/badge.svg" alt="CI"></a>
  <a href="https://axm-protocols.github.io/axm-init/explanation/check-grades/"><img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/axm-protocols/axm-bib/gh-pages/badges/axm-init.json" alt="axm-init"></a>
  <a href="https://axm-protocols.github.io/axm-audit/"><img src="https://img.shields.io/endpoint?url=https://raw.githubusercontent.com/axm-protocols/axm-bib/gh-pages/badges/axm-audit.json" alt="axm-audit"></a>
  <a href="https://coveralls.io/github/axm-protocols/axm-bib?branch=main"><img src="https://coveralls.io/repos/github/axm-protocols/axm-bib/badge.svg?branch=main" alt="Coverage"></a>
  <a href="https://pypi.org/project/axm-bib/"><img src="https://img.shields.io/pypi/v/axm-bib" alt="PyPI"></a>
  <img src="https://img.shields.io/badge/python-3.12%2B-blue" alt="Python 3.12+">
  <a href="https://axm-protocols.github.io/axm-bib/"><img src="https://img.shields.io/badge/docs-live-brightgreen" alt="Docs"></a>
</p>

---

## Features

- 🔍 **Search** — Find papers by title/keywords (6-source fan-out: S2 + CrossRef + DBLP + OpenReview + arXiv + OpenAlex, with source visibility)
- 🏛️ **OpenReview** — Search papers and browse venue proceedings (NeurIPS, ICLR, ICML) with synthetic BibTeX
- 📦 **arXiv** — Native BibTeX fetching and Atom XML search with guaranteed open-access PDFs
- 🔗 **OpenAlex** — Cross-referencing 240M+ works: title → DOI, arXiv ID, MAG ID + citation graph
- 🇫🇷 **HAL** — French academic publications (CNRS/CCSD) with native BibTeX
- 🎯 **Universal Resolver** — Auto-detect DOI, arXiv, HAL, DBLP, or OpenReview (`openreview:ID` or URL) → BibTeX (ID-only, titles rejected)
- 🏷️ **Venue Enrichment** — arXiv entries auto-enriched with venue metadata from S2/DBLP (e.g. `@misc` → `@inproceedings{…, booktitle={NeurIPS}}`)
- 🔀 **Multi-Ref Merge** — Resolve multiple identifiers for the same paper into a single BibTeX entry with field merging, entry type promotion, and coherence check
- 📊 **Citation Graph** — S2 citations, references, recommendations + OpenAlex cited-by and referenced-works
- 📥 **PDF Pipeline** — Download, extract, and organize papers in one command
- 📄 **Content Extraction** — PDF → Markdown + figure PNGs (PyMuPDF)
- 🪃 **OA Fallback** — Direct download from arXiv and ACL Anthology when Unpaywall has no URL
- 🪝 **Protocol Hook** — Automatically extract paper metadata for AXM protocol sessions
- 🤖 **MCP Integration** — Auto-discovered tools via `axm-mcp`

## Installation

```bash
uv add axm-bib
```

## Quick Start

```bash
# Search papers
axm-bib search "attention is all you need"

# Resolve any identifier to BibTeX (DOI, arXiv, HAL, DBLP, OpenReview)
axm-bib resolve "arXiv:1706.03762"
axm-bib resolve "10.1145/363235.363259"

# Merge multiple IDs for the same paper
axm-bib resolve "10.1234/test" "arXiv:2503.18813"

# Download, extract & organize a paper (full pipeline)
axm-bib pdf 10.48550/arXiv.1706.03762
axm-bib pdf arXiv:2503.18813

# Browse citation graph
axm-bib graph 10.1145/363235.363259 --direction citations
```

### Pipeline Output

`axm-bib pdf` creates a complete paper folder:

```
~/axm/papers/vaswani2017attention/
├── vaswani2017attention.pdf   # downloaded PDF
├── paper.bib                  # BibTeX entry
├── content.md                 # extracted Markdown
└── figures/
    ├── fig_001.png
    └── ...
```

```
Downloaded: ~/axm/papers/vaswani2017attention
  PDF: vaswani2017attention.pdf (1,234,567 bytes)
  BibTeX: paper.bib
  Markdown: content.md (8,432 words, 45,123 chars, 12 pages)
  Figures: 8
```

## CLI Commands

### `axm-bib search`

| Option | Default | Description |
|---|---|---|
| `QUERY` | *required* | Search query (title, keywords) |
| `--limit`, `-n` | 10 | Max results (1–100) |
| `--abstract/--no-abstract` | `True` | Show paper abstracts |
| `--abstract-len` | 0 (full) | Truncate abstracts to N chars |

### `axm-bib resolve`

| Option | Description |
|---|---|
| `ID [ID ...]` | One or more identifiers: DOI, arXiv ID, HAL ID, DBLP key, or OpenReview URL |

Auto-detects each identifier type and routes to the appropriate client. Multiple IDs for the same paper are merged with coherence check. Titles are rejected — use `search` instead.

### `axm-bib graph`

| Option | Default | Description |
|---|---|---|
| `PAPER_ID` | *required* | Paper ID (DOI, arXiv ID, or S2 paper ID) |
| `--direction`, `-d` | `citations` | Graph direction: `citations`, `references`, or `similar` |
| `--limit`, `-n` | 20 | Max results |
| `--abstract/--no-abstract` | `True` | Show paper abstracts |
| `--abstract-len` | 0 (full) | Truncate abstracts to N chars |

### `axm-bib pdf`

| Option | Default | Description |
|---|---|---|
| `REF` | *required* | Any identifier (DOI, arXiv ID, HAL ID, etc.) |
| `--output-dir`, `-o` | `~/axm/papers/` | Output directory |

Downloads the PDF via multi-source URL chain (S2 OA → arXiv direct → Unpaywall → DOI fallbacks),
extracts Markdown + figures, and writes `paper.bib` — all in one step.

### `axm-bib extract`

| Option | Default | Description |
|---|---|---|
| `PDF_PATH` | *required* | Path to a local PDF file |
| `--output-dir`, `-o` | auto | Output directory |
| `--figures/--no-figures` | `True` | Extract figures as PNG |

Standalone extraction for PDFs you already have.

## MCP Integration

`axm-bib` tools are automatically discovered by `axm-mcp`:

| Tool | Description |
|---|---|
| `bib_search` | Search papers by keywords |
| `bib_resolve` | Resolve one or more identifiers to BibTeX (multi-ref merge) |
| `bib_resolve_batch` | Batch-resolve N papers × M identifiers each |
| `bib_pdf` | Full pipeline: download + extract + BibTeX (any identifier) |
| `bib_graph` | Citation graph traversal (S2 + OpenAlex fallback) |
| `bib_extract` | Extract a local PDF to Markdown + figures |

## Configuration

| Variable | Required | Description |
|---|---|---|
| `UNPAYWALL_EMAIL` | For PDF downloads | Email for Unpaywall API (prompted on first `pdf` use) |
| `S2_API_KEY` | No | [Semantic Scholar API key](https://www.semanticscholar.org/product/api#api-key-form) for higher rate limits |

```bash
# Optional: set S2 API key for higher rate limits
export S2_API_KEY="your-api-key"
```

Config file: `~/.config/axm-bib/config.toml` — see [Configuration reference](https://axm-protocols.github.io/axm-bib/reference/config/) for details.

## Development

```bash
git clone https://github.com/axm-protocols/axm-bib.git
cd axm-bib
uv sync --all-groups
uv run pytest
uv run ruff check src/  # lint
```

## License

Apache License 2.0
