Metadata-Version: 2.3
Name: norm_toolkit
Version: 1.9.1
Summary: Toolkit to normalize text to UMLS / ontologies
Author: Haydn Jones
Author-email: Haydn Jones <haydnjonest@gmail.com>
Requires-Dist: asyncpg>=0.29.0
Requires-Dist: clickhouse-connect>=1.0.0
Requires-Dist: duckdb>=1.5.0
Requires-Dist: lvg-norm>=1.3.0
Requires-Dist: polars[rt64]>=1.36.1
Requires-Dist: pyarrow>=20.0.0
Requires-Dist: pydantic>=2.12.5
Requires-Dist: python-dotenv>=1.2.2
Requires-Dist: sqlalchemy>=2.0.0
Requires-Dist: tqdm>=4.67.1
Requires-Python: >=3.12
Description-Content-Type: text/markdown

## ClickHouse backend

The DuckDB builder remains the source of truth. Build a DuckDB file with
`build_merged_duckdb`, then upload its canonical tables into ClickHouse:

```bash
uv run python scripts/upload_clickhouse.py data/dbs_final/SmallMolecule.duckdb --database normalization
```

The upload shows a progress bar for each copied table; pass `--no-progress` to
silence it.

Connection settings are read from `.env` with `python-dotenv` and use the
official `clickhouse-connect` client. Set `CH_HTTP`, for example
`http://host:8123/normalization`; `CH_USER` and `CH_PASSWORD` may be supplied
separately and override URL credentials.

Use the ClickHouse backend from Python:

```python
from norm_toolkit import ClickHouseNormalizer

normalizer = ClickHouseNormalizer(database="normalization")
result = normalizer.normalize(["aspirin"], top_k=5)
```

You can also pass a DSN in code:

```python
normalizer = ClickHouseNormalizer(
    dsn="http://host:8123/normalization",
    database="normalization",
)
```
