Metadata-Version: 2.4
Name: pirl-trufflepig
Version: 1.0.0
Summary: RNA tumor analysis driven by pirlygenes gene sets. Migrates the pirlygenes analyze pipeline into composable sub-commands with serializable intermediate state for incremental website-style runs.
Author-email: Alex Rubinsteyn <alex.rubinsteyn@unc.edu>
Classifier: Programming Language :: Python :: Implementation :: CPython
Classifier: Environment :: Console
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pirlygenes>=5.0.0
Requires-Dist: numpy>=2.0.0
Requires-Dist: pandas>=2.0
Requires-Dist: matplotlib>=3.5.0
Requires-Dist: tqdm
Requires-Dist: pyyaml
Provides-Extra: web
Requires-Dist: fastapi>=0.100; extra == "web"
Requires-Dist: uvicorn>=0.20; extra == "web"
Requires-Dist: python-multipart>=0.0.6; extra == "web"
Requires-Dist: markdown>=3.4; extra == "web"
Provides-Extra: test
Requires-Dist: pytest; extra == "test"
Requires-Dist: pytest-cov; extra == "test"
Requires-Dist: pytest-xdist; extra == "test"
Requires-Dist: httpx; extra == "test"
Requires-Dist: fastapi>=0.100; extra == "test"
Requires-Dist: python-multipart>=0.0.6; extra == "test"
Requires-Dist: markdown>=3.4; extra == "test"
Dynamic: license-file

# trufflepig

> RNA tumor analysis driven by [`pirlygenes`](https://github.com/pirl-unc/pirlygenes) gene sets.

## What this is

`trufflepig` is the analysis, plotting, reporting, and CLI layer for RNA
tumor analysis. It loads curated gene sets and reference expression data
from the [`pirlygenes`](https://github.com/pirl-unc/pirlygenes) package,
which is now data-only.

The legacy `pirlygenes analyze` CLI has been **fully migrated** into
this repo as `trufflepig run`. Multi-sample longitudinal comparison
(`pirlygenes compare-analyze`) is `trufflepig compare`. Per-stage
extraction of the analyze pipeline (so a web UI can stream incremental
results) is the next track.

## Install

```
pip install -e .
```

Pulls `pirlygenes>=5.0.0` for the curated gene sets and reference data.

## Usage

### Single-sample analysis

```
trufflepig run \
    --sample path/to/quant.sf \
    --workspace out/patient_X_baseline \
    --cancer-type BLCA
```

Output layout:

```
out/patient_X_baseline/
  meta.json            # trufflepig run metadata (versions + args)
  analyze/             # full analyze output: figures, markdown reports, TSVs
  records/             # (created, currently empty) — reserved for per-stage
                       #   records once Phase 2 extraction lands
  figures/             # (created, currently empty) — reserved for the
                       #   stage-level figure layout
```

Today, every analyze artifact (markdown, figures, TSVs, the bundled
PDF) lives under `analyze/`. The empty sibling directories are the
seam for per-stage extraction (trufflepig#2–#14); once stages start
writing their own records, `analyze/` shrinks.

Common pass-through flags: `--hla-types`, `--fusions`, `--alterations`,
`--alignment-qc`, `--sample-mode`, `--tumor-context`, `--site-hint`,
`--met-site`, `--decomposition-templates`, `--output-image-prefix`,
`--sample-id-col`, `--sample-id-value`, `--gene-id-col`, `--gene-name-col`,
`--label-genes`, `--genes`, `--transcripts`,
`--aggregate-gene-expression`, `--expression-qc-rescue`,
`--therapy-target-top-k`, `--therapy-target-tpm-threshold`, `--force`.
All have the same meaning as in the old `pirlygenes analyze`.

### Multi-sample (longitudinal)

```
trufflepig compare \
    --workspace out/patient_X_longitudinal \
    --inputs out/patient_X_baseline,out/patient_X_relapse \
    --title "Patient X — baseline vs relapse"
```

`--inputs` accepts both trufflepig workspaces (auto-descends to
`analyze/`) and legacy pirlygenes output directories.

### Reference / cohort introspection

```
trufflepig data            # list bundled gene-set CSVs and TCGA cohorts
trufflepig cancers         # browse the cancer-type registry
trufflepig cancers --family sarcoma --details
trufflepig plot-cancer-cohorts --output-prefix /tmp/cohort
```

### Web UI

```
pip install 'pirl-trufflepig[web]'
trufflepig serve --port 8000
# open http://127.0.0.1:8000
```

Upload a TPM file or salmon quant in the browser, watch each pipeline
stage stream back, and read the rendered `summary.md` / `analysis.md` /
`brief.md` inline. Comparison runs work the same way — pick prior runs
by ID. Each run writes a self-contained workspace under
`$TRUFFLEPIG_WEB_ROOT` (default `$HOME/trufflepig-web-runs`).

### Pipeline DAG

```
trufflepig list-stages
```

The DAG is the post-migration target for `trufflepig stage <name>`. The
top-level `trufflepig run` already runs the full pipeline; stage-level
execution is wired in as stages are extracted from the migrated
codebase.

## Layout

```
trufflepig/
  cli.py            # argparse entry exposed as the `trufflepig` console script
  main.py           # migrated analyze/compare_analyze + report assembly
  workspace.py      # workspace layout (meta.json + records/ + figures/)
  pipeline.py       # stage DAG (name -> upstream dependencies)
  analyze/          # data contracts shared with the migrated pipeline
  decomposition/    # compartment-fit engine + panels + plot helpers
  stages/           # one module per stage (post-extraction)
  load_expression.py, sample_context.py, tumor_purity.py,
  decomposition/, plot*.py, brief.py, confidence.py, ...   # the analysis code
```

## Roadmap

### Phase 1 — Subsume pirlygenes analyze ✅

- [x] Wire `trufflepig run` as a thin bridge to `pirlygenes.cli.analyze`
      (trufflepig#19)
- [x] Wire `trufflepig compare` as a thin bridge to
      `pirlygenes.cli.compare_analyze`
- [x] **Mass-move analysis modules** from pirlygenes to trufflepig
      (trufflepig#1). pirlygenes now ships data only.
- [x] Native `trufflepig run` / `trufflepig compare` dispatch — no bridge

### Phase 2 — Per-stage extraction

Break the migrated `analyze` function into the stage DAG so a web UI
can run and stream single stages:

- [ ] `load_expression` — parse sample TPM TSV/CSV into a canonical
      frame ([#2](https://github.com/pirl-unc/trufflepig/issues/2))
- [ ] `sample_context` — infer library prep, preservation, degradation
      ([#3](https://github.com/pirl-unc/trufflepig/issues/3))
- [ ] `analyze` — cancer-type call + purity
      ([#4](https://github.com/pirl-unc/trufflepig/issues/4))
- [ ] `decompose` — compartment-level decomposition fit
      ([#5](https://github.com/pirl-unc/trufflepig/issues/5))
- [ ] `ranges` — per-target tumor-expression ranges + attribution
      ([#6](https://github.com/pirl-unc/trufflepig/issues/6))
- [ ] `confidence` — purity + per-target confidence tiers
      ([#7](https://github.com/pirl-unc/trufflepig/issues/7))
- [ ] `render_targets`, `render_summary`, `render_analysis`,
      `render_provenance`, `render_brief`
      ([#8](https://github.com/pirl-unc/trufflepig/issues/8)–[#12](https://github.com/pirl-unc/trufflepig/issues/12))
- [ ] `bundle` — figures into PDF + finalize `meta.json`
      ([#13](https://github.com/pirl-unc/trufflepig/issues/13))
- [ ] Per-stage record schema documentation
      ([#14](https://github.com/pirl-unc/trufflepig/issues/14))

### Phase 3 — Multi-sample / longitudinal

`trufflepig compare` runs today; the richer layer:

- [ ] Explicit delta tables — cancer-call shifts, purity drift, target
      gains/losses, MHC/HLA changes, immune / IFN / hypoxia / EMT /
      therapy-response axis movement, assay/library differences that
      limit comparability (extension of
      [pirlygenes#230](https://github.com/pirl-unc/pirlygenes/issues/230))
- [ ] Cohort-level comparisons (browse N samples with the same cancer
      type; surface outlier targets)
- [ ] Patient-level provenance graph linking baseline → progression
      samples

### Phase 4 — Web UI

A single-page web frontend so a user can drop in a TPM or salmon quant,
watch each stage stream back, and download the rendered markdown / PDF.

- [x] FastAPI app + browser UI (`trufflepig serve`) with file upload,
      background analyze, server-sent-events progress stream, inline
      rendered reports, and longitudinal comparison launcher
      ([#16](https://github.com/pirl-unc/trufflepig/issues/16))
- [x] Streaming progress + per-stage output hooks (SSE stream of
      analyze stdout)
      ([#15](https://github.com/pirl-unc/trufflepig/issues/15))
- [ ] Reference-data layout for lazy-load from R2/S3 with browser cache
      ([#18](https://github.com/pirl-unc/trufflepig/issues/18))
- [ ] Pyensembl-free gene resolution (HGNC CSV dict lookup) for fast
      cold-start in serverless / browser contexts
      ([#17](https://github.com/pirl-unc/trufflepig/issues/17))
- [ ] Auth + workspace persistence so a user can return to a prior run
- [ ] Production deploy target (serverless) replacing the local
      subprocess runner with a remote-job submission

## Non-goals

- No JSON mirror of the markdown reports — the rendered markdown has
  named human audiences; a JSON mirror would have no real consumer.
- No change to the gene-set data in `pirlygenes`.

## Local-report regeneration

Researcher workflow: replay a private manifest of analyses on local
samples and write outputs **outside the repo**:

```
python scripts/regenerate_local_reports.py \
    --source /path/to/pirlygenes/local_reports/<run>/manifest.json \
    --root ~/trufflepig-local-reports/<stamp>
```

The script refuses to write inside the repo. The default `--root` is
`$HOME/trufflepig-local-reports/<timestamp>/`.

## License

Apache 2.0 — see [LICENSE](LICENSE).
