Metadata-Version: 2.4
Name: ads-bib
Version: 0.1.1
Summary: Pipeline for querying and turning NASA's ADS publications metadata into curated, analysis-ready datasets, topic maps, and citation networks.
Project-URL: Homepage, https://github.com/raphschlatt/ads-bib
Project-URL: Documentation, https://raphschlatt.github.io/ads-bib/
Project-URL: Repository, https://github.com/raphschlatt/ads-bib
Project-URL: Issues, https://github.com/raphschlatt/ads-bib/issues
Author: Raphael Schlattmann
Maintainer: Raphael Schlattmann
License-Expression: MIT
License-File: LICENSE
Keywords: bibliometrics,citation-networks,dataset-curation,digital-humanities,publication-metadata,topic-modeling
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: <3.13,>=3.12
Requires-Dist: accelerate>=1.0
Requires-Dist: ads-and[cpu-onnx,modal]<0.2,>=0.1.3
Requires-Dist: bertopic
Requires-Dist: ctranslate2
Requires-Dist: dask
Requires-Dist: datamapplot<0.7,>=0.6.4
Requires-Dist: fast-hdbscan<0.3,>=0.2.2
Requires-Dist: fasttext-wheel
Requires-Dist: huggingface-hub
Requires-Dist: litellm
Requires-Dist: matplotlib
Requires-Dist: networkx
Requires-Dist: numpy>=1.24
Requires-Dist: openai
Requires-Dist: pacmap
Requires-Dist: pandas>=2.0
Requires-Dist: pyarrow
Requires-Dist: python-dotenv
Requires-Dist: pyyaml
Requires-Dist: requests
Requires-Dist: scikit-learn
Requires-Dist: scipy>=1.10
Requires-Dist: seaborn
Requires-Dist: sentence-transformers>=5.1
Requires-Dist: sentencepiece
Requires-Dist: spacy>=3.7
Requires-Dist: toponymy==0.4.0
Requires-Dist: torch<2.7,>=2.6
Requires-Dist: tqdm
Requires-Dist: transformers<4.57,>=4.56
Provides-Extra: hdbscan
Requires-Dist: hdbscan; extra == 'hdbscan'
Provides-Extra: test
Requires-Dist: pytest>=7; extra == 'test'
Requires-Dist: ruff>=0.6; extra == 'test'
Provides-Extra: umap
Requires-Dist: umap-learn; extra == 'umap'
Description-Content-Type: text/markdown

# ads-bib

![Python 3.12](https://img.shields.io/badge/python-3.12-blue)
![License MIT](https://img.shields.io/badge/license-MIT-green)
[![Docs](https://img.shields.io/badge/docs-online-blue)](https://raphschlatt.github.io/ads-bib/)
[![Open in Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/raphschlatt/ads-bib/blob/main/pipeline.ipynb)

`ads-bib` takes a NASA ADS search query and produces a normalized, curated dataset, with disambiguated author names (AND via [ads-and](https://github.com/raphschlatt/ads-and)), topic models (via [BERTopic](https://maartengr.github.io/BERTopic/) or [Toponymy](https://github.com/TutteInstitute/toponymy)), and citation networks ready for e.g. [Gephi](https://gephi.org/), [CiteSpace](http://cluster.cis.drexel.edu/~cchen/citespace/), or [VOSviewer](https://www.vosviewer.com/), locally or via API.

## Installation

Use [uv](https://docs.astral.sh/uv/) and Python 3.12.
```bash
uv pip install ads-bib
# or: pip install ads-bib
```

## Quick Start

Create a `.env` file in your project root with the relevant API keys.

```bash
ADS_TOKEN=your-ads-token           # required
OPENROUTER_API_KEY=your-key        # only for the openrouter road
HF_TOKEN=your-key                  # only for the huggingface road
MODAL_TOKEN_ID=your-modal-id       # only for AND with backend=modal
MODAL_TOKEN_SECRET=your-modal-secret
```
[ADS user token settings](https://ui.adsabs.harvard.edu/user/settings/token) | [OpenRouter Keys](https://openrouter.ai/settings/keys) | [Hugging Face Access Tokens](https://huggingface.co/settings/tokens) | [Modal](https://modal.com/).

Then run in your terminal:

```bash
ads-bib run --preset openrouter --set search.query='author:"Hawking, S*"'
```

Author name disambiguation is off by default. Enable the local CPU/GPU path
with `--set author_disambiguation.enabled=true`; use
`--set author_disambiguation.backend=modal` only when your Modal credentials are
configured.

Full setup details: [Get Started](https://raphschlatt.github.io/ads-bib/get-started/) | [Runtime Roads](https://raphschlatt.github.io/ads-bib/runtime-roads/)

## Python API

```python
import ads_bib

ads_bib.run(
    preset="openrouter",
    query='author:"Hawking, S*"',
)
```

More examples and the `NotebookSession` interface: [Python API docs](https://raphschlatt.github.io/ads-bib/python-api/)

## Pick a Runtime Road

| Road | Hardware | Network | Cost |
| --- | --- | --- | --- |
| `openrouter` | any | API calls | pay-per-token |
| `hf_api` | any | API calls | HF-plan-dependent |
| `local_cpu` | CPU only | model downloads only | free after setup |
| `local_gpu` | NVIDIA + CUDA | model downloads only | free after setup |

Full provider matrix and first-run behavior: [Runtime Roads](https://raphschlatt.github.io/ads-bib/runtime-roads/)

## Output

Each run produces a self-contained output directory:

- **`publications.parquet`** — cleaned, translated, topic-labeled publications, with disambiguated authors when AND is enabled
- **`references.parquet`** — normalized cited-reference metadata, with disambiguated authors when AND is enabled
- **`topic_info.parquet`** — one row per topic with labels, counts, and representation fields
- **`topic_map.html`** — interactive topic visualization (open in any browser), using [datamapplot](https://github.com/TutteInstitute/datamapplot)
- **`.gexf` citation networks** — direct citation, co-citation, bibliographic coupling, author co-citation
- **`download_wos_export.txt`** — Web of Science format for e.g. [CiteSpace](https://citespace.podia.com/) / [VOSviewer](https://www.vosviewer.com/)
- **`run_summary.yaml`** — full run metadata and stage timings

[![Interactive topic map from the Hawking query](docs/assets/topic_map_demo.gif)](https://github.com/TutteInstitute/datamapplot)
*Topic map output from `author:"Hawking, S*"` in [datamapplot](https://github.com/TutteInstitute/datamapplot).*

[![Author co-citation network from the Hawking query](docs/assets/gephi_lite_demo.gif)](https://gephi.org/gephi-lite/)
*Author co-citation output from `author:"Hawking, S*"` in [Gephi Lite](https://gephi.org/gephi-lite/).*
