Metadata-Version: 2.4
Name: tnmhelper
Version: 0.1.0
Summary: AJCC TNM cancer staging engine: rules-walker + observation derivation
Author: Hong-Kai (Walther) Chen, Po-Yen Tzeng, Kai-Po Chang
License: MIT License
        
        Copyright (c) 2025 Hong-Kai (Walther) Chen, Po-Yen Tzeng, Kai-Po Chang
                           Med NLP Lab, China Medical University
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/kblab2024/tnmhelper
Project-URL: Repository, https://github.com/kblab2024/tnmhelper
Project-URL: Issues, https://github.com/kblab2024/tnmhelper/issues
Keywords: cancer,staging,ajcc,tnm,oncology,pathology
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Healthcare Industry
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Medical Science Apps.
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Provides-Extra: gui
Requires-Dist: nicegui>=1.4; extra == "gui"
Provides-Extra: lint
Requires-Dist: pandas>=2.0; extra == "lint"
Provides-Extra: packaging
Requires-Dist: pyinstaller>=6; extra == "packaging"
Requires-Dist: pywebview>=5; extra == "packaging"
Dynamic: license-file

# tnmhelper — AJCC TNM cancer staging engine

A small, rules-driven engine that maps clinical observations (tumor
size, invasion depth, node count, biomarkers, …) to an AJCC stage
group. Each `(organ, AJCC edition)` is described by four declarative
files: a Python model definition, a CSV rules table, a criteria JSON,
and a derivation JSON. The engine walks specificity-sorted rules at
query time — no precomputed flattening, no separate cache.

This repository is organised into **four layers and six components**
so the data, logic, public API, and three consumers (GUI, HTTP, CLI)
can evolve independently. Adding a new `(organ, edition)` is still a
drop-in change: four new files, zero edits to the core pipeline.

## Layers and components

| Layer | Component | Lives at | Role |
|---|---|---|---|
| 1 Data | `tnmhelper_data/` | sibling top-level dir | Source of truth: rules, criteria, derivations, and per-organ model `.py` files |
| 2 Backend | `tnmhelper.backend` | `tnmhelper/backend/` | Pure logic: rule walker, derivation engine, lint, shared types |
| 3 Python API | `tnmhelper` (top-level) | `tnmhelper/api.py` + `__init__.py` | Stable importable surface for downstream consumers (e.g. dspy-based extractors) |
| 4 Consumers | `tnmhelper.consumers.gui` | `tnmhelper/consumers/gui.py` | NiceGUI browser app |
| | `tnmhelper.consumers.http` | `tnmhelper/consumers/http.py` | stdlib HTTP server |
| | `tnmhelper.consumers.cli` | `tnmhelper/consumers/cli.py` | argparse CLI |

The three consumers depend only on the Python API; they do not import
backend internals. The backend reads the data layer through a
[`DataProvider`](tnmhelper/data_provider/__init__.py) protocol that
abstracts over the filesystem tree and the optional zip bundle.

## Repository tree

```
tnmhelper/                      Layer 2-4 — Python package (no data)
  __init__.py                   public API re-exports
  api.py                        Layer 3 facade (stage / derive_tnm / …)
  schema.py                     dataclasses: ModelSpec, ObservableSpec, DerivedStage
  backend/                      Layer 2
    engine.py                   rules walker (TNM -> Stage)
    derive.py                   observations -> TNM
    lint.py                     validation (CSV / criteria / derivation)
    _shared.py                  Edition, Biomarker, Classification, YesNo, StagingModel
    _machine/                   contract.json + new_model_template.py
  models/                       empty namespace; __path__ injected at runtime
    __init__.py                 MODEL_REGISTRY, auto-discovery
  data_provider/                runtime data access abstraction
    filesystem.py               reads tnmhelper_data/ directly
    zipbundle.py                reads a .zip bundle via zipimport + ZipFile
  bundle/                       export tool
    __main__.py                 `python -m tnmhelper.bundle ...`
    exporter.py                 pack tnmhelper_data/ into a .zip
  consumers/                    Layer 4 (each importing only tnmhelper)
    cli.py, http.py, gui.py
    requirements-gui.txt

tnmhelper_data/                 Layer 1 — source of truth
  models/ajcc<X>/<organ>.py     model definition (StrEnums + StagingModel)
  ajcc<X>/<organ>/
    rules.csv                   human-edited rules table
    criteria.json               TNM category descriptions
    derivation.json             observation -> TNM rules

tnmhelper_pdfsource/            AJCC source PDFs (reference; excluded from bundle)
tests/
  smoke.py                      end-to-end smoke test (replaces main.py)
  bench.py                      per-query latency microbenchmark
pyproject.toml                  package metadata + console entry points
AGENTS.md                       onboarding for coding agents
```

## Installation

```
pip install -e .                   # editable install (dev)
pip install -e ".[gui]"            # add NiceGUI for the GUI consumer
```

After install, three console scripts are available:

```
tnmhelper       — CLI (list / stage / derive / interactive / …)
tnmhelper-http  — HTTP server on port 8000 by default
tnmhelper-gui   — NiceGUI browser app on port 8080 by default
```

## Quickstart — as a Python library

```python
import tnmhelper

tnmhelper.set_data_source(None)         # autodetect (TNMHELPER_DATA env,
                                        # packaged bundle, or ./tnmhelper_data/)

tnmhelper.organs()
# ['ampulla', 'breast', 'cervix', 'colon', 'esophagus_adeno', ...]

tnmhelper.observable_schema("lung", "AJCC 9")
# {'size_cm': ObservableSpec(name='size_cm', type='number', unit='cm', ...),
#  'histology': ObservableSpec(...), ...}

tnmhelper.stage_from_observations(
    "lung", "AJCC 9",
    {"size_cm": 1.5},
    Classification="c", DescY="No", DescR="No", DescM="No",
)
# DerivedStage(organ='lung', edition='AJCC 9', T='T1b', N='N0', M='M0',
#              stage='IA2', source='ajcc9/lung/rules.csv:row10', ...)
```

The full surface is `tnmhelper.set_data_source`, `organs`,
`editions_for`, `model_spec`, `observable_schema`, `stage`,
`derive_tnm`, `stage_from_observations`, `explain` plus the
`ModelSpec` / `ObservableSpec` / `DerivedStage` dataclasses. See
[tnmhelper/api.py](tnmhelper/api.py) for full docstrings.

## Quickstart — as a consumer

```
tnmhelper list                                          # all (organ, edition) pairs
tnmhelper interactive                                   # prompted form-fill loop
tnmhelper stage breast --edition "AJCC 8" \
    --T T3 --N N0 --M M0 --Classification c \
    --DescY No --DescR No --DescM No --Grade G3 \
    --HER2 Negative --ER Positive --PR Negative

tnmhelper-http --port 8000                              # HTTP API
tnmhelper-gui  --port 8080                              # browser GUI
```

## Validation

After any data or model change:

```
python -m tnmhelper.backend.lint     # header / conflict / enum / edition checks
python -m tests.smoke                # end-to-end: every model.examples + derivation.examples
```

`tests/smoke.py` exits non-zero on any failure. Use it as the
integration check — there is no separate test suite.

## Adding a new `(organ, edition)` — still drop-in

Four new files, no edits to anything else:

1. `tnmhelper_data/models/<edition_slug>/<name>.py` — defines the
   TNM `StrEnum`s, a `NamedTuple` state class, and a module-level
   `MODEL = StagingModel(...)`. See
   [tnmhelper_data/models/README.md](tnmhelper_data/models/README.md).
2. `tnmhelper_data/<edition_slug>/<name>/rules.csv` — header MUST be
   exactly `list(MODEL.columns) + ["Stage"]`; cells are enum literals
   or `ANY`.
3. `tnmhelper_data/<edition_slug>/<name>/criteria.json` — `edition`
   field MUST match `MODEL.edition.value`.
4. `tnmhelper_data/<edition_slug>/<name>/derivation.json` — observable
   declarations + per-axis ordered rules.

Then run `python -m tnmhelper.backend.lint` followed by
`python -m tests.smoke`.

## Data bundling — for installs on other systems

The data layer can be packed into a single `.zip` for shipping. See
[tnmhelper/bundle/README.md](tnmhelper/bundle/README.md) for the full
guide. Quick reference:

```
python -m tnmhelper.bundle export --out tnmhelper-data.zip --verify
```

Then on the target system, after `pip install tnmhelper`:

```python
tnmhelper.set_data_source("/path/to/tnmhelper-data.zip")
```

Or set `TNMHELPER_DATA=/path/to/tnmhelper-data.zip` in the environment
and call `tnmhelper.set_data_source(None)` — the autodetect chain picks
up the env var. The bundle includes the per-organ `.py` model files;
Python's stdlib `zipimport` loads them directly from the archive.

## Why no flattened cache

An earlier version of this engine precomputed a flattened JSONL lookup
per organ. That gave O(1) queries but ballooned to ~80 MB per organ
once high-cardinality wildcards (breast) expanded. The current
rules-walker design keeps disk to the original ~kilobyte CSV and
memory to a few-hundred-rule list per organ; per-query cost is
O(rules) with early exit, well under a millisecond for any organ
modeled here. Favourable trade-off when artifact size and startup
matter (bundle deployment, GUI cold-start).
