Metadata-Version: 2.4
Name: purrPore
Version: 0.1.0
Summary: Merge ONT nanopore data from POD5 and FASTQ for per-pore QC
License: MIT
Keywords: nanopore,ont,pod5,fastq,qc,sequencing
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: biopython>=1.79
Requires-Dist: matplotlib>=3.5
Requires-Dist: pandas>=1.3
Requires-Dist: pod5>=0.2.0
Requires-Dist: pyarrow>=10.0.0
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"
Requires-Dist: pytest-cov>=4; extra == "dev"

# purrPore

Merge Oxford Nanopore (ONT) sequencing data from **POD5** and **FASTQ** for **per-pore QC**.

purrPore loads per-read metadata from POD5 files (channel, well, duration, sample rate) and merges it with FASTQ-derived metrics (length, mean Q, GC). It aggregates by **(channel, well)** for true per-pore QC—each channel has up to 4 wells (nanopores). Includes **N50 per pore**, **pore dropout detection**, and **flow cell heatmaps** (physical MinION layout).

## Install

```bash
pip install -e .
```

Requires Python ≥3.8.

## CLI

```bash
purrPore run reads.fastq pod5_dir/ -o qc_out/
purrPore run reads.fastq pod5_dir/ -o qc_out/ --heatmap   # also generate heatmaps
```

## Python API

```python
import purrPore

results = purrPore.run_qc("reads.fastq", "pod5_dir/")

# Merged read-level table (FASTQ + POD5 metadata)
print(results.merged)

# Per-pore: reads, yield_bp, mean_len, mean_q, mean_gc, median_speed, active_time_s, n50
print(results.per_pore)

# Flowcell: total_reads, total_yield_bp, mean_q, active_pores, dropout_pores
print(results.flowcell)

# Pores (channel, well) that produced 0 reads (dropout)
print(results.dropout_pores[:20])

# Write outputs (CSV, Parquet, dropout list)
purrPore.write_qc_results(results, "qc_out/")

# Flow cell heatmap (128×16 pore grid: 4 wells per channel)
purrPore.plot_flowcell_heatmap(results.per_pore, value_col="yield_bp", out_path="heatmap.png")
```

## Features

- **Flow cell heatmap**: Physical MinION layout (128×16 pore grid: 4 wells per channel)
- **Read N50 per pore**: Length such that 50% of pore yield is in reads of that length or longer
- **Pore dropout detection**: Pores (channel, well) with 0 reads (written to `dropout_pores.txt`)

## Data

- **FASTQ**: read_id, read_length, mean_q, gc
- **POD5**: read_id, channel, well, duration_s, ...
- **Per-pore**: channel, well, reads, yield_bp, mean_len, mean_q, mean_gc, median_speed, active_time_s, **n50**
- **Flowcell**: total_reads, total_yield_bp, mean_q, active_pores, **dropout_pores**

## License

MIT.
