Metadata-Version: 2.4
Name: maggic-wand
Version: 0.1.3
Summary: Publication-quality plots from MAGGIC pipeline results
Author-email: Kranti Konganti <Kranti.Konganti@fda.hhs.gov>
License: MIT
Requires-Python: <4.0,>=3.12
Requires-Dist: beartype>=0.18
Requires-Dist: matplotlib>=3.8
Requires-Dist: pandas>=2.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: rich>=13.0
Requires-Dist: scipy>=1.11
Requires-Dist: seaborn>=0.13
Requires-Dist: typer>=0.9
Requires-Dist: wordcloud>=1.9
Provides-Extra: dev
Requires-Dist: black; extra == 'dev'
Requires-Dist: flynt; extra == 'dev'
Requires-Dist: isort; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Description-Content-Type: text/markdown

# maggic-wand

`maggic-wand` generates publication-quality static plots from [`MAGGIC`](https://github.com/CFSAN-Biostatistics/MAGGIC) (**M**etagenome **A**ssembled **G**enome with **I**terative **C**lassification) pipeline output. It takes the `maggic-results.tsv` table and optional `maggic-globalabundance.tsv` (`CoverM` TMM matrix) and produces PNG charts suitable for downstream `MultiQC` integration.

\
&nbsp;

## Problems It Solves

- **Manual plotting**: Avoids per-run manual invocation of `ggplot`/`matplotlib` scripts
- **Inconsistent taxonomy parsing**: extracts genus, species, and phylum from `GTDB-Tk` style `__`-delimited strings in one place
- **Abundance aggregation**: joins `CoverM` TMM per-bin values to genus/species level with taxonomy from results
- **Reproducible output**: fixed color palettes (tab20-based), deterministic figure sizes, 300dpi PNG

\
&nbsp;

## How It Works

1. User points `maggic-wand` at a directory containing `maggic-results.tsv`
2. The tool loads results, optionally joins with `maggic-globalabundance.tsv` for abundance-aware plots
3. Taxonomy (genus, species, phylum) is extracted and attached to abundance rows
4. Requested plot types are generated as 300dpi PNGs

\
&nbsp;

<!-- TOC -->

- [Minimum Requirements](#minimum-requirements)
- [Installation](#installation)
- [Input Files](#input-files)
- [Usage](#usage)
  - [Generate All Plots](#generate-all-plots)
  - [Single Plot Type](#single-plot-type)
  - [Individual Plot Commands](#individual-plot-commands)
- [Plot Types](#plot-types)
- [CLI Help](#cli-help)

<!-- /TOC -->

\
&nbsp;

## Minimum Requirements

1. **Python 3.11+**
2. **Dependencies**: `pandas`, `matplotlib`, `seaborn`, `scipy`, `wordcloud`, `typer`, `beartype`, `rich`

\
&nbsp;

## Installation

### Installation from PyPI

```bash
pip install maggic-wand
```

Or with `uv`:

```bash
uv pip install maggic-wand
```

The package installs the `maggic-wand` CLI entry point.

### Installation via uv (Editable)

```bash
git clone repo-url
cd maggic-wand
uv sync
```

### Installation from Source

```bash
pip install .
```

For development with formatting tools:

```bash
pip install -e ".[dev]"
```

\
&nbsp;

## Input Files

`maggic-wand` expects these files in the results directory:

| File | Required | Source |
|------|----------|--------|
| `maggic-results.tsv` | Yes | `MAGGIC` pipeline output (30-column TSV with bin metadata, taxonomy, AMR, MGE fields) |
| `maggic-globalabundance.tsv` | No | `CoverM` TMM-normalized abundance matrix (bins x samples) |
| `ALL_REFINED_BINS_QUALITY_REPORT.tsv` | No | `Binette` post deduplication quality report (12 columns) |

Plots that depend on abundance data (diversity, heatmap) are skipped with a warning if `maggic-globalabundance.tsv` is absent. The quality-ecdf plot is skipped if the `Binette` quality report is absent.

\
&nbsp;

## Usage

### Generate All Plots

```bash
maggic-wand plot --maggic-results-dir /path/to/maggic/results/
```

This discovers `maggic-results.tsv` and optional companion files, then writes all PNG plots to `<results_dir>/plots/`.

### Single Plot Type

```bash
# Only diversity (genus + species) and heatmap (genus + species)
maggic-wand plot --maggic-results-dir results/ --plot-type diversity,heatmap
```

### Individual Plot Commands

For targeted use (e.g., scripting a single chart):

```bash
# Diversity stacked bars (genus + species)
maggic-wand plot-diversity results/maggic-results.tsv results/maggic-globalabundance.tsv -o diversity.png -n 10

# Quality scatter
maggic-wand plot-quality results/maggic-results.tsv -o quality.png

# Heatmaps (genus + species)
maggic-wand plot-heatmap results/maggic-results.tsv results/maggic-globalabundance.tsv -o heatmap.png -n 30
```

\
&nbsp;

## Plot Types

| Plot | Flag | Input Needed | Description |
|------|------|-------------|-------------|
| Diversity | `diversity` | results + abundance | Stacked bars at genus and species level, top-N taxa, TMM-summed per sample, normalized to 100% |
| Heatmap | `heatmap` | results + abundance | Hierarchical clustering heatmaps at genus and species level, phylum-colored side bar, column dendrogram |
| Quality scatter | `quality` | results | Completeness vs Contamination, colored by bin type |
| AMR wordcloud | `amr-wordcloud` | results | Gene frequency from AMR_Genes column |
| AMR donut | `amr-donut` | results | AMR class distribution, top-15 + Other |
| MGE radar | `mge-radar` | results | Dual radar (plasmid + virus profile), top-N bins |
| Quality ECDF | `quality-ecdf` | results + Binette report | Cumulative completeness/contamination/score by binning tool (`VAMB`, `MetaBat2`, `SemiBin2`) |
| Contig scatter | `contigscatter` | results | log10(contig count) vs Completeness, marker size = genome size |

\
&nbsp;

## CLI Help

```bash
maggic-wand --help
maggic-wand plot --help
maggic-wand plot-diversity --help
maggic-wand plot-heatmap --help
```
