Metadata-Version: 2.4
Name: opencite
Version: 0.2.2
Summary: Academic literature search, citation management, and PDF retrieval CLI
Project-URL: Repository, https://github.com/neuromechanist/opencite
Project-URL: Issues, https://github.com/neuromechanist/opencite/issues
Author-email: Seyed Yahya Shirazi <shirazi@ieee.org>
License-Expression: MIT
License-File: LICENSE
Keywords: academic,bibtex,citations,literature,openalex,pdf,pubmed,search,semantic-scholar
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Text Processing :: Markup :: Markdown
Requires-Python: >=3.11
Requires-Dist: httpx>=0.27.0
Requires-Dist: markit-mistral>=0.2.2
Requires-Dist: markitdown>=0.1.0
Requires-Dist: pyalex>=0.15
Provides-Extra: convert
Requires-Dist: markit-mistral>=0.2.2; extra == 'convert'
Requires-Dist: markitdown>=0.1.0; extra == 'convert'
Provides-Extra: dev
Requires-Dist: pre-commit>=4.0.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.24.0; extra == 'dev'
Requires-Dist: pytest-cov>=5.0.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.8.0; extra == 'dev'
Description-Content-Type: text/markdown

# OpenCite

Academic literature search, citation management, and PDF retrieval CLI.

Searches Semantic Scholar, OpenAlex, and PubMed in parallel, deduplicates results, and supports BibTeX output, citation graph traversal, PDF retrieval, batch downloads, and PDF-to-markdown conversion.

## Quick Start

Install and set up your API keys:

```bash
uv pip install opencite                # or: pip install opencite
opencite config init                   # creates ~/.opencite/config.toml
```

Add your API keys to `~/.opencite/config.toml` or export them as environment variables:

```bash
export SEMANTIC_SCHOLAR_API_KEY=your_key
export PUBMED_API_KEY=your_key
export OPENALEX_API_KEY=your_key
```

Start searching:

```bash
opencite search "transformer attention mechanism"
opencite lookup 10.1038/nature12345
opencite canonical "deep learning" --min-citations 500
opencite cite 10.1038/nature12345
opencite pdf 10.1038/nature12345 -o paper.pdf --convert
```

> [!NOTE]
> **Claude Code plugin:** Type `/plugin`, select "Add marketplace", enter `neuromechanist/opencite`, and restart. Then use `/opencite` or ask Claude directly.

> [!TIP]
> **PDF conversion:** Install with `uv pip install 'opencite[convert]'` for markitdown and markit-mistral support.

## Commands

### search - Find papers

```bash
opencite search "query" [--max N] [--source all|openalex|s2|pubmed]
    [--year-from YYYY] [--year-to YYYY] [--oa-only]
    [--sort relevance|citations|year] [-f text|json|bibtex|csv] [-o FILE] [-v]
```

### lookup - Look up papers by identifier

```bash
opencite lookup IDENTIFIER [IDENTIFIER ...] [--enrich] [--append-bib FILE]
    [-f text|json|bibtex] [-o FILE] [-v]
```

Accepts DOI, `pmid:X`, `pmc:X`, `arxiv:X`, S2 ID, or OpenAlex ID. Supports multiple IDs.

### cite - Citation graph

```bash
opencite cite IDENTIFIER [--direction citing|references|both] [--max N]
    [--sort citations|year] [--min-citations N] [-f text|json|bibtex] [-o FILE]
```

### canonical - Most-cited papers in a field

```bash
opencite canonical "topic" [--max N] [--year-from YYYY] [--min-citations N]
    [-f text|json|bibtex] [-o FILE]
```

### pdf - Download PDF

```bash
opencite pdf IDENTIFIER [-o PATH] [--filename NAME] [--convert]
    [--converter auto|markitdown|mistral]
```

`-o` accepts a file path (e.g., `paper.pdf`) or directory. With `--convert`, also generates a markdown file alongside the PDF.

### convert - PDF to markdown

```bash
opencite convert FILE.pdf [-o FILE] [--converter auto|markitdown|mistral]
    [--extract-images] [--images-dir DIR]
```

Auto mode uses markit-mistral when `MISTRAL_API_KEY` is set (better for math and complex layouts), otherwise falls back to markitdown (free, local).

### batch-fetch - Batch download PDFs

```bash
opencite batch-fetch FILE [-o DIR] [--convert] [--concurrency N] [--summary FILE]
opencite batch-fetch --from-json FILE [options]
opencite batch-fetch --from-stdin [options]
```

Downloads PDFs for multiple papers with controlled concurrency. Supports text files (one ID per line), JSON files (array of DOIs or opencite search results), and stdin.

Example workflow:

```bash
# Search and save as JSON, then batch download with conversion
opencite search "tDCS motor cortex" --max 30 -f json -o results.json
opencite batch-fetch --from-json results.json --convert --summary report.json -o ./papers
```

### ids - Convert between identifiers

```bash
opencite ids IDENTIFIER [IDENTIFIER ...] [-f text|json]
```

Converts between DOI, PMID, and PMCID using the NCBI ID Converter API.

### config - Manage configuration

```bash
opencite config init    # create ~/.opencite/config.toml template
opencite config show    # display resolved config (keys masked)
opencite config path    # show config file location
```

## Output Formats

All search/lookup/cite/canonical commands support `-f`/`--format`:

- `text` (default) - human-readable output
- `json` - structured JSON
- `bibtex` - BibTeX entries for citation managers
- `csv` - comma-separated values (search only)

Use `-o`/`--output FILE` to write to a file instead of stdout.

## Installation

```bash
# uv (recommended)
uv pip install opencite
uv pip install 'opencite[convert]'     # with PDF conversion (markitdown + markit-mistral)

# pip
pip install opencite
pip install 'opencite[convert]'

# uvx (no install needed, runs from cache)
uvx opencite --version
```

For development:

```bash
git clone https://github.com/neuromechanist/opencite.git
cd opencite
uv sync --extra dev
```

## Configuration

OpenCite supports TOML config, `.env` files, and environment variables.

```bash
opencite config init    # creates ~/.opencite/config.toml with template
opencite config show    # display resolved config (keys masked)
opencite config path    # show config file location
```

### Config loading priority

Later sources override earlier ones:

1. `~/.opencite/config.toml`
2. `~/.opencite/.env`
3. `.env` in working directory
4. Environment variables

### API keys

Required for academic database access:

```bash
export SEMANTIC_SCHOLAR_API_KEY=your_key
export PUBMED_API_KEY=your_key
export OPENALEX_API_KEY=your_key
```

Optional:

```bash
export MISTRAL_API_KEY=your_key        # for PDF-to-markdown via Mistral OCR
```

### Publisher tokens (optional)

For authenticated PDF downloads from paywalled publishers:

```bash
export ELSEVIER_API_KEY=your_key       # Elsevier/ScienceDirect
export WILEY_TDM_TOKEN=your_token      # Wiley TDM
export SPRINGER_API_KEY=your_key       # Springer Nature
```

These can also be set in `~/.opencite/config.toml`:

```toml
[publishers]
elsevier = "your_key"
wiley_tdm = "your_token"
springer = "your_key"
```

## License

MIT
