Metadata-Version: 2.4
Name: kreview
Version: 0.0.8
Summary: Evaluate cfDNA fragmentomics features for ctDNA detection
Author-email: Ronak Shah <shahr2@mskcc.org>
License: AGPL-3.0
Project-URL: Repository, https://github.com/msk-access/kreview
Project-URL: Documentation, https://msk-access.github.io/kreview
Keywords: nbdev,jupyter,notebook,python
Classifier: Natural Language :: English
Classifier: Intended Audience :: Developers
Classifier: Development Status :: 3 - Alpha
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=2.0.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: scipy>=1.10.0
Requires-Dist: scikit-learn>=1.2.0
Requires-Dist: duckdb>=0.10.0
Requires-Dist: pyarrow>=14.0.0
Requires-Dist: structlog>=23.1.0
Requires-Dist: typer>=0.9.0
Requires-Dist: plotly>=5.14.0
Requires-Dist: kaleido>=0.2.1
Requires-Dist: itables>=1.7.0
Requires-Dist: great_tables>=0.15.0
Requires-Dist: shap>=0.40.0
Requires-Dist: joblib>=1.3.0
Requires-Dist: statsmodels>=0.14.0
Requires-Dist: xgboost>=2.0.0
Provides-Extra: jupyter
Requires-Dist: papermill>=2.5.0; extra == "jupyter"
Requires-Dist: jupyter>=1.0.0; extra == "jupyter"
Requires-Dist: matplotlib>=3.7.0; extra == "jupyter"
Provides-Extra: test
Requires-Dist: pytest>=7.0.0; extra == "test"
Requires-Dist: pytest-cov>=4.0.0; extra == "test"
Provides-Extra: dev
Requires-Dist: black[jupyter]>=23.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.5.0; extra == "docs"
Requires-Dist: mkdocstrings[python]>=0.24.0; extra == "docs"
Requires-Dist: mkdocs-typer>=0.0.3; extra == "docs"
Requires-Dist: mike>=2.0.0; extra == "docs"
Requires-Dist: mkdocs-mermaid2-plugin>=1.2.3; extra == "docs"
Requires-Dist: mkdocs-glightbox>=0.5.2; extra == "docs"
Requires-Dist: mkdocs-git-revision-date-localized-plugin>=1.5.1; extra == "docs"
Requires-Dist: mkdocs-panzoom-plugin>=0.5.2; extra == "docs"
Requires-Dist: mkdocs-pdf>=0.1.2; extra == "docs"
Provides-Extra: all
Requires-Dist: papermill>=2.5.0; extra == "all"
Requires-Dist: jupyter>=1.0.0; extra == "all"
Requires-Dist: matplotlib>=3.7.0; extra == "all"
Requires-Dist: pytest>=7.0.0; extra == "all"
Requires-Dist: pytest-cov>=4.0.0; extra == "all"
Requires-Dist: black[jupyter]>=23.0.0; extra == "all"
Requires-Dist: ruff>=0.1.0; extra == "all"
Requires-Dist: mypy>=1.0.0; extra == "all"
Requires-Dist: mkdocs-material>=9.5.0; extra == "all"
Requires-Dist: mkdocstrings[python]>=0.24.0; extra == "all"
Requires-Dist: mkdocs-typer>=0.0.3; extra == "all"
Requires-Dist: mike>=2.0.0; extra == "all"
Requires-Dist: mkdocs-mermaid2-plugin>=1.2.3; extra == "all"
Requires-Dist: mkdocs-glightbox>=0.5.2; extra == "all"
Requires-Dist: mkdocs-git-revision-date-localized-plugin>=1.5.1; extra == "all"
Requires-Dist: mkdocs-panzoom-plugin>=0.5.2; extra == "all"
Requires-Dist: mkdocs-pdf>=0.1.2; extra == "all"
Dynamic: license-file

<div align="center">
  <img src="https://img.shields.io/github/v/tag/msk-access/kreview?label=Release&color=FF9B42" alt="Release Badge">
  <img src="https://img.shields.io/badge/nbdev-Enabled-blue.svg" alt="nbdev Badge">
  <img src="https://img.shields.io/badge/Powered_by-DuckDB-yellow.svg" alt="DuckDB Badge">
  <img src="https://img.shields.io/badge/Reports-Quarto-blueviolet.svg" alt="Quarto Badge">
  <a href="https://deepwiki.com/msk-access/kreview"><img src="https://deepwiki.com/badge.svg" alt="Ask DeepWiki"></a>
  
  <h1>kreview</h1>
  <p><b>Advanced cfDNA Fragmentomics Core Evaluation Engine</b></p>
</div>

---

## 🧬 Overview

`kreview` is a production-grade, notebook-first (`nbdev`) evaluation engine designed for high-throughput cancer liquid biopsy fragmentomics feature analysis. Developed at Memorial Sloan Kettering (MSKCC), it processes cohorts containing tens of thousands of samples using an embedded DuckDB query engine with chunked I/O and automatic retry logic.

📖 **[Full Documentation](https://msk-access.github.io/kreview/)**

## 🚀 Features

- **5-Tier ctDNA Taxonomy**: MSK-IMPACT paired-inference to label `True ctDNA+`, `Possible ctDNA+`, `Possible ctDNA−`, `Healthy Normal`, and `Insufficient Data`.
- **DuckDB Dynamic Data Lake**: In-memory `read_parquet` bindings with chunked I/O and exponential backoff retry. Builds a merged SQL-queryable `kreview_lake.duckdb` on demand.
- **Multi-Model Evaluation**: Random Forest, XGBoost, and Logistic Regression with Stratified K-Fold CV, SHAP explainability, and subgroup analysis.
- **Interactive Dashboards**: Plotly-native HTML reports with ROC curves, violin plots, SHAP beeswarm/waterfall, and per-cancer-type sensitivity tables.
- **26 Built-In Evaluators**: Modular extractors covering fragment sizes (FSC, FSD, FSR), nucleosome protection (WPS, TFBS), cleavage motifs (EndMotif, BreakPointMotif), chromatin accessibility (ATAC), motif divergence (MDS), and orientation (OCF).

## ⚙️ Quick Start

### Installation

> [!IMPORTANT]
> **Quarto is strictly required** for programmatic dashboard generation. Because `quarto-cli` wrapper packages are unreliable across Python environments, `kreview` assumes the Quarto executable is installed dynamically on your OS or container.

#### Option 1: Docker (Recommended "Batteries-Included" Method)
The easiest way to run `kreview` without managing external dependencies is to use our pre-built Docker container (hosted on GHCR). It natively ships with `Python 3.12`, all ML libraries, and the underlying `quarto` linux binaries configured flawlessly:
```bash
docker pull ghcr.io/msk-access/kreview:latest
docker run -v /your/data:/data ghcr.io/msk-access/kreview:latest \
  kreview run --cancer-samplesheet /data/cancer.csv ...
```

#### Option 2: Local Install (Pip)
If you install via pip, you **must separately install Quarto** via your OS manager:
1. **Install Quarto:** Follow the [official Quarto Installation Guide](https://quarto.org/docs/get-started/) (e.g. `brew install quarto` on macOS).
2. **Install kreview:**
```bash
git clone https://github.com/msk-access/kreview.git
cd kreview
pip install -e .
```

### Running the Pipeline

```bash
PYTHONUNBUFFERED=1 kreview run \
  --cancer-samplesheet "/path/to/cancer/samplesheet.csv" \
  --healthy-xs1-samplesheet "/path/to/healthy/xs1/samplesheet.csv" \
  --healthy-xs2-samplesheet "/path/to/healthy/xs2/samplesheet.csv" \
  --cbioportal-dir "/path/to/cBioPortal_MAF_CNA_SV/" \
  --krewlyzer-dir "/path/to/unified_krewlyzer_results" \
  --output output/ \
  --workers 4 \
  --export-duckdb
```

### Dashboard Access

Once finished, open the generated HTML reports:
```bash
open output/reports/ATAC_dashboard.html
```

## 🏗️ nbdev Architecture

This project operates as an `nbdev` repo. Do **not** edit `.py` scripts manually in `kreview/`. Build natively inside Jupyter notebooks within `nbs/` and trigger:
```bash
nbdev-export
```

## 📚 Resources

- **[Documentation](https://msk-access.github.io/kreview/)** — Full user and developer guide
- **[Contributing](CONTRIBUTING.md)** — How to contribute
- **[Changelog](https://msk-access.github.io/kreview/changelog/)** — Version history
