Metadata-Version: 2.4
Name: pubchem-compounds
Version: 1.1.0
Summary: A Python wrapper for the PubChem PUG REST API. Provides convenient methods to retrieve chemical information using CIDs, SIDs, CAS numbers, SMILES, and InChIKeys, and to convert between chemical identifiers. Built-in rate limiting and retry logic; RDKit integration for SDF/mol retrieval.
Author-email: Luc Miaz <luc@miaz.ch>
License: Creative Commons Attribution-NonCommercial 4.0 International (CC BY-NC 4.0)
        
        Copyright (c) 2024–2026 Luc T. Miaz
        
        This work is licensed under the Creative Commons Attribution-NonCommercial 4.0
        International License. To view a copy of this license, visit
        https://creativecommons.org/licenses/by-nc/4.0/ or send a letter to
        Creative Commons, PO Box 1866, Mountain View, CA 94042, USA.
        
        You are free to:
        
          Share — copy and redistribute the material in any medium or format
          Adapt — remix, transform, and build upon the material
        
        Under the following terms:
        
          Attribution — You must give appropriate credit, provide a link to the
          license, and indicate if changes were made. You may do so in any reasonable
          manner, but not in any way that suggests the licensor endorses you or your use.
        
          NonCommercial — You may not use the material for commercial purposes.
        
          No additional restrictions — You may not apply legal terms or technological
          measures that legally restrict others from doing anything the license permits.
        
        Notices:
        
          You do not have to comply with the license for elements of the material in
          the public domain or where your applicable use is permitted by an exception
          or limitation.
        
          No warranties are given. The license may not give you all of the permissions
          necessary for your intended use. For example, other rights such as publicity,
          privacy, or moral rights may limit how you use the material.
        
Project-URL: Homepage, https://gitlab.com/lucmiaz/pubchem
Project-URL: Documentation, https://pubchem-compounds.readthedocs.io
Project-URL: Repository, https://gitlab.com/lucmiaz/pubchem.git
Project-URL: Changelog, https://pubchem-compounds.readthedocs.io/en/latest/changelog.html
Keywords: PubChem,chemistry,cas,cid,rdkit,cheminformatics,PFAS
Classifier: Programming Language :: Python
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Natural Language :: English
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Chemistry
Classifier: Topic :: Scientific/Engineering :: Information Analysis
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests
Requires-Dist: numpy>=1.19.4
Requires-Dist: regex
Requires-Dist: tqdm>=4.65.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: pylint; extra == "dev"
Provides-Extra: docs
Requires-Dist: sphinx>=7.0; extra == "docs"
Requires-Dist: sphinx-rtd-theme; extra == "docs"
Requires-Dist: sphinx-autoapi; extra == "docs"
Dynamic: license-file

# pubchem-compounds

[![PyPI](https://img.shields.io/pypi/v/pubchem-compounds)](https://pypi.org/project/pubchem-compounds/)
[![License: CC BY-NC 4.0](https://img.shields.io/badge/License-CC%20BY--NC%204.0-lightgrey.svg)](https://creativecommons.org/licenses/by-nc/4.0/)
[![Docs](https://readthedocs.org/projects/pubchem-compounds/badge/?version=latest)](https://pubchem-compounds.readthedocs.io)
[![Python 3.9+](https://img.shields.io/badge/python-3.9%2B-blue)](https://www.python.org/downloads/)

A Python wrapper for the [PubChem PUG REST API](https://pubchem.ncbi.nlm.nih.gov/docs/pug-rest),
focused on Compounds and Substances. It provides:

- Identifier conversion: **CAS → CID / SID**, **InChIKey → CID**, **SMILES → CID**, **CID → CAS / EINECS / DTXSID**
- **Batch property fetching** from lists of CIDs
- **SDF / RDKit mol retrieval** for CIDs, SIDs, or CAS numbers
- **SMILES and InChI** extraction for any PubChem synonym
- **PFAS classification tree** node queries
- **Built-in rate limiting** (max 5 req/s, 400 req/min) and automatic retry on HTTP 403

## Installation

### From PyPI

```bash
pip install pubchem-compounds
```

> **Note:** RDKit is required only for functions that return molecules
> (`get_mols_from_cids`, `cas_to_mols`, `synonyms_to_smiles`, etc.).
> Install it separately:
>
> ```bash
> pip install rdkit          # rdkit ≥ 2023.03
> # or via conda:
> conda install -c conda-forge rdkit
> ```

### From source

```bash
git clone https://gitlab.com/lucmiaz/pubchem.git
cd pubchem
pip install -e .
```

Install with optional dependencies:

```bash
pip install -e ".[dev]"   # pytest, pylint
pip install -e ".[docs]"  # sphinx, sphinx-rtd-theme
```

## Quick start

```python
import pubchem_compounds as pc
```

### CAS → CID

```python
mapping, failed = pc.cas_to_cid("7732-18-5")  # water
print(mapping)   # {'7732-18-5': [962]}
```

### Batch CAS lookup

```python
cas_list = ["7732-18-5", "74-82-8", "71-43-2"]
mapping, failed = pc.cas_to_cid(cas_list)
# {'7732-18-5': [962], '74-82-8': [297], '71-43-2': [241]}
```

### CAS → SMILES

```python
processed, failed = pc.cas_to_smiles(["7732-18-5", "74-82-8"])
print(processed["7732-18-5"])  # 'O'
print(processed["74-82-8"])    # 'C'
```

### CAS → RDKit molecules

```python
mols = pc.cas_to_mols(["7732-18-5", "74-82-8"])
for cas, mol_list in mols.items():
    for mol in mol_list:
        print(cas, mol.GetNumAtoms())
```

### Fetch compound properties for a list of CIDs

```python
data = pc.get_from_cids([962, 297, 241], target="property/MolecularFormula,MolecularWeight")
for prop in data["PropertyTable"]["Properties"]:
    print(prop["CID"], prop["MolecularFormula"], prop["MolecularWeight"])
```

### CID → CAS (reverse lookup)

```python
cas_list = pc.cid_to_cas(962)
print(cas_list)  # ['7732-18-5']
```

### InChIKey → CID

```python
cids = pc.inchikey_to_pubchem("XLYOFNOQVPJJNP-UHFFFAOYSA-N")
print(cids)  # [962]
```

### DTXSID → SMILES

```python
processed, failed = pc.dtxsid_to_smiles(["DTXSID9020584"])
print(processed["DTXSID9020584"])
```

### PFAS classification tree

```python
# Fetch all CIDs from the OECD PFAS list node (default hnid = 5517102)
cids = pc.pubchem_pfas_tree()
print(f"{len(cids)} PFAS CIDs found")
```

For a full API reference and more examples, see the
[documentation](https://pubchem-compounds.readthedocs.io).

## Dependencies

| Package | Purpose |
|---------|---------|
| `requests` | HTTP requests |
| `numpy` | Rate-limiting random intervals |
| `regex` | CAS / EINECS / DTXSID pattern matching |
| `tqdm` | Progress bars for batch operations |
| `rdkit` *(optional)* | SDF parsing and mol/SMILES generation |

## Licence

Copyright © 2024–2026 Luc T. Miaz.  
Licensed under the [Creative Commons Attribution-NonCommercial 4.0 International](https://creativecommons.org/licenses/by-nc/4.0/) (CC BY-NC 4.0) licence.

## Acknowledgments

Developed under the [ZeroPM project](https://zeropm.eu) (WP2) funded by the
European Union's Horizon 2020 research and innovation programme
(grant agreement No 101036756).
Developed at the Department of Environmental Science, Stockholm University.

[![Powered by RDKit](https://img.shields.io/badge/Powered%20by-RDKit-3838ff.svg)](https://www.rdkit.org/)

