Metadata-Version: 2.4
Name: lcmd-db
Version: 0.3.1
Summary: Python client for the LCMD molecular database
Project-URL: Homepage, https://lcmd-app.epfl.ch
Project-URL: Documentation, https://lcmd-app.epfl.ch/docs
Project-URL: Source, https://github.com/lcmd-epfl/db
Project-URL: Changelog, https://github.com/lcmd-epfl/db/blob/master/CHANGELOG.md
Project-URL: Bug Tracker, https://github.com/lcmd-epfl/db/issues
Author-email: Romain Graux <author@romaingrx.com>
License: MIT
License-File: LICENSE
Keywords: chemistry,lcmd,molecular-database,quantum-chemistry
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Chemistry
Classifier: Typing :: Typed
Requires-Python: >=3.9
Requires-Dist: click>=8.0
Requires-Dist: jinja2>=3.1
Requires-Dist: platformdirs>=4.0.0
Requires-Dist: polars>=0.20.0
Requires-Dist: pooch>=1.8.0
Requires-Dist: pyarrow>=14.0.0
Requires-Dist: pydantic-settings>=2.0
Requires-Dist: tqdm>=4.67.1
Description-Content-Type: text/markdown

# LCMD-DB

Python client for the [LCMD molecular database](https://lcmd-app.epfl.ch).

## Installation

```bash
uv add lcmd-db
# or
pip install lcmd-db
```

## Quick start

```python
from lcmd_db import load_dataset

# Load a subset (molecules included by default)
data = load_dataset("qm9")
data.molecules  # polars DataFrame

# Include multiple entity types
data = load_dataset("qm9", include=["molecules", "reactions", "fragments"])

# Include XYZ structure files
data = load_dataset("qm9", include_structures=True)

# Select specific properties
data = load_dataset("qm9", molecule_properties=["smiles", "energy"])

# Choose output format
data = load_dataset("qm9", data_format="csv")

# Force re-download (bypass cache)
data = load_dataset("qm9", force_download=True)
```

## Dataset API

`load_dataset` returns a `SubsetData` object containing polars DataFrames for each entity type. Convert it to a typed `Dataset` for lazy loading, filtering, and indexing:

```python
data = load_dataset("qm9")

# Get a typed, iterable Dataset
molecules = data.as_dataset("molecules")

# Index and iterate
mol = molecules[0]          # Molecule(id=..., properties={...})
batch = molecules[10:20]    # Dataset slice

# Filter with polars expressions
import polars as pl
heavy = molecules.filter(pl.col("molecular_weight") > 100)

# Select specific columns
subset = molecules.select("smiles", "energy")

# Train/test split
train, test = molecules.train_test_split(test_size=0.2, seed=42)

# Export
df = molecules.to_polars()
pdf = molecules.to_pandas()
atoms = molecules.to_ase()   # requires ase
```

## CLI

```bash
# Download a dataset
lcmd-db download qm9
lcmd-db download qm9 --include molecules --include structures
lcmd-db download qm9 -f csv --force

# Browse available schemas
lcmd-db schema list
lcmd-db schema show qm9

# Manage cache
lcmd-db cache clear
lcmd-db cache clear qm9

# Sync type stubs for autocomplete
lcmd-db stubs sync
```

## Configuration

Settings are controlled via environment variables (prefix `LCMD_DB_`) or programmatically:

| Variable | Description | Default |
|---|---|---|
| `LCMD_DB_BASE_URL` | API base URL | `https://lcmd-app.epfl.ch` |
| `LCMD_DB_CACHE_DIR` | Cache directory | OS-dependent |
| `LCMD_DB_TIMEOUT` | Download timeout (seconds) | `300` |
| `LCMD_DB_AUTO_SYNC_STUBS` | Auto-sync type stubs on import | `true` |

```python
from lcmd_db import settings

settings.base_url = "https://custom-instance.example.com"
```
