Metadata-Version: 2.4
Name: tablassert
Version: 7.3.3
Summary: Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution and quality control built in.
Project-URL: Homepage, https://github.com/SkyeAv/Tablassert
Project-URL: Source, https://github.com/SkyeAv/Tablassert
Project-URL: Documentation, https://skyeav.github.io/Tablassert/
Author-email: Skye Lane Goetz <sgoetz@isbscience.org>
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: bioinformatics,data quality control,declarative pipeline,entity resolution,kgx,knowledge graph,ncats translator,ner,tablassert,table mining,yaml configuration
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Framework :: Pydantic
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Healthcare Industry
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.11
Requires-Dist: duckdb>=1.5.0
Requires-Dist: fastexcel>=0.19.0
Requires-Dist: lazy-loader>=0.5
Requires-Dist: loguru>=0.7.3
Requires-Dist: onnxruntime>=1.24.4
Requires-Dist: optimum-onnx>=0.1.0
Requires-Dist: orjson>=3.11.7
Requires-Dist: playwright>=1.58.0
Requires-Dist: polars-hash>=0.5.6
Requires-Dist: polars>=1.39.0
Requires-Dist: pyarrow>=23.0.1
Requires-Dist: pydantic>=2.12.5
Requires-Dist: pyexcel>=0.7.4
Requires-Dist: pyyaml>=6.0.3
Requires-Dist: rapidfuzz>=3.14.3
Requires-Dist: scikit-learn>=1.8.0
Requires-Dist: sentence-transformers>=5.3.0
Requires-Dist: sqlite-utils>=3.39
Requires-Dist: typer>=0.21.2
Requires-Dist: xxhash>=3.6.0
Provides-Extra: rt
Requires-Dist: polars[rtcompat]>=1.39.0; extra == 'rt'
Provides-Extra: rtcompat
Requires-Dist: polars[rtcompat]>=1.39.0; extra == 'rtcompat'
Description-Content-Type: text/markdown

# Tablassert

[![PyPI](https://img.shields.io/pypi/v/tablassert.svg)](https://pypi.org/project/tablassert/)
[![Python](https://img.shields.io/pypi/pyversions/tablassert.svg)](https://pypi.org/project/tablassert/)
[![License](https://img.shields.io/pypi/l/tablassert.svg)](https://github.com/SkyeAv/Tablassert/blob/main/LICENSE)
[![Docs](https://img.shields.io/github/deployments/SkyeAv/Tablassert/github-pages?label=docs)](https://skyeav.github.io/Tablassert/)

Extract knowledge assertions from tabular data into NCATS Translator-compliant KGX NDJSON — declaratively, with entity resolution and quality control built in.

```bash
pip install tablassert
tablassert build-knowledge-graph config.yaml
```

**[Full Documentation](https://skyeav.github.io/Tablassert/)** — installation guides, tutorials, configuration reference, and API docs.

## Installation

```bash
pip install tablassert
```

All dependencies (ML, web, Excel support) are included in the base install. An optional extra is available for CPU compatibility:

```bash
pip install "tablassert[rtcompat]"  # Polars build for CPUs without required instructions
```

<details>
<summary><strong>Docker</strong></summary>

```bash
docker pull ghcr.io/skyeav/tablassert:latest

docker run --rm \
  -v /path/to/config:/data \
  -v /path/to/datassert:/datassert \
  ghcr.io/skyeav/tablassert:latest \
  build-knowledge-graph /data/graph-config.yaml
```

</details>

## Quick Demo

```bash
# Build a knowledge graph from a YAML configuration
$ tablassert build-knowledge-graph graph-config.yaml
⠋ Loading table configurations...
⠋ Resolving entities across 12 DuckDB shards...
⠋ Compiling subgraphs...
⠋ Deduplicating nodes and edges...
✓ Done — wrote nodes.ndjson and edges.ndjson to .storassert/
```

Define your entities and relationships in YAML, point tablassert at your data, and get NCATS Translator-compliant KGX NDJSON out the other side — no code required.

## Key Features

- **Declarative Configuration** — YAML-based, no code required
- **Entity Resolution** — Maps text to biological entities (genes, diseases, chemicals)
- **Quality Control** — Three-stage validation (exact → fuzzy → BERT embeddings)
- **KGX Compliance** — NCATS Translator-compatible NDJSON output
- **Performance** — Lazy evaluation pipelines with Polars and DuckDB-accelerated entity resolution

## Contributing

See **[CONTRIBUTING.md](CONTRIBUTING.md)** for development setup, code style, and pull request guidelines.

## License

[Apache License 2.0](LICENSE)

## Contributors

[Skye Lane Goetz](mailto:sgoetz@isbscience.org) — Institute for Systems Biology, CalPoly SLO

[Gwênlyn Glusman](mailto:gglusman@isbscience.org) — Institute for Systems Biology

Jared C. Roach — Institute for Systems Biology
