Metadata-Version: 2.4
Name: vflank
Version: 0.2.0
Summary: Variant-aware flanking-sequence extraction and masking for ddPCR assay design
Project-URL: Homepage, https://github.com/rhshah/vFlank
Project-URL: Documentation, https://rhshah.github.io/vFlank/
Project-URL: Repository, https://github.com/rhshah/vFlank
Project-URL: Issues, https://github.com/rhshah/vFlank/issues
Author: Ronak Shah
License-Expression: Apache-2.0
License-File: LICENSE
Keywords: MAF,bioinformatics,ddPCR,flanking-sequence,fusion,gnomAD,primer-design
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.10
Requires-Dist: pandas>=2.0
Requires-Dist: pysam>=0.22
Requires-Dist: rich>=13
Requires-Dist: typer>=0.12
Provides-Extra: dev
Requires-Dist: mypy; extra == 'dev'
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest>=8; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mike>=2.1; extra == 'docs'
Requires-Dist: mkdocs-git-revision-date-localized-plugin>=1.2; extra == 'docs'
Requires-Dist: mkdocs-glightbox>=0.4; extra == 'docs'
Requires-Dist: mkdocs-material>=9.5; extra == 'docs'
Requires-Dist: mkdocs-mermaid2-plugin>=1.1; extra == 'docs'
Requires-Dist: mkdocs-panzoom-plugin>=0.5; extra == 'docs'
Requires-Dist: mkdocs-section-index>=0.3; extra == 'docs'
Requires-Dist: mkdocs-typer2>=0.1; extra == 'docs'
Requires-Dist: mkdocstrings[python]>=0.25; extra == 'docs'
Description-Content-Type: text/markdown

# vflank

[![CI](https://github.com/rhshah/vFlank/actions/workflows/ci.yml/badge.svg)](https://github.com/rhshah/vFlank/actions/workflows/ci.yml)
[![Docs](https://img.shields.io/badge/docs-mkdocs--material-blue)](https://rhshah.github.io/vFlank/)
[![License: Apache 2.0](https://img.shields.io/badge/License-Apache_2.0-green.svg)](LICENSE)
[![Python](https://img.shields.io/badge/python-3.10%2B-blue)](pyproject.toml)

**Variant-aware flanking-sequence extraction and masking for ddPCR assay design.**

`vflank` is the *front-end* of a ddPCR assay-design pipeline. It takes genomic
variants — small variants (SNPs/indels) and structural variants (fusions) — and
emits the sequence an assay is designed around: the masked flanks of each variant
or the chimeric junction of a fusion. Primer/probe design itself is delegated
downstream to established tools.

📖 **Documentation: <https://rhshah.github.io/vFlank/>**

## Features

- **Small variants** (`vflank small`) — ±N bp flanks from a MAF, raw + masked
  FASTA, deduplicated per unique variant (`CHR_POS_REF_ALT`).
- **Fusions / SVs** (`vflank fusion`) — reverse-complement-aware junction
  sequences from an iCallSV / iAnnotateSV breakpoint table (columns by name).
- **SNP masking, two backends** — local gnomAD VCFs *or* the gnomAD GraphQL API
  (no download), each with `--pop-data {genome,exome,both}`.
- **Patient consensus from a BAM** (`--bam`/`--bam-map`) — build the flank/junction
  from the patient's own reads (hom-ALT corrected, het/low-cov handled) so primers
  match the real template; for both small variants and fusions.
- **No silent failures** — genome-build guard, flank-truncation detection, and a
  categorised skip summary + optional TSV report.

Planned: VCF input (small + BND SV) and downstream emit formats.
See [`docs/ARCHITECTURE.md`](docs/ARCHITECTURE.md).

## Install

```bash
pip install vflank                                   # from PyPI (released versions)
pip install git+https://github.com/rhshah/vFlank.git # latest from GitHub
# development:
git clone https://github.com/rhshah/vFlank.git && cd vFlank
pip install -e ".[dev]"
```

Requires Python ≥ 3.10 (Linux/macOS) and `pysam`, `pandas`, `typer`, `rich`.

### Docker

Images are published to GHCR on each release:

```bash
docker run --rm -v "$PWD:/data" ghcr.io/rhshah/vflank \
    small run /data/variants.maf -r /data/GRCh37.fasta -g hg19 -o /data/out.fasta
```

## Quick start

```bash
vflank small run variants.maf \
    --ref-genome /path/to/GRCh37.fasta \
    --pop-vcf-dir /path/to/gnomad_v2.1.1/ \
    --genome-build hg19 \
    --flank 200 \
    --output flanking_sequences.fasta
```

`--genome-build` defaults to **hg19** (GRCh37 / gnomAD v2.1.1); pass `-g hg38`
for GRCh38 / gnomAD v4. gnomAD v4 has no GRCh37 build.

### Masking sources

Common-SNP masking can come from local gnomAD VCFs or the gnomAD API:

- `--pop-source vcf` (default) — local per-chromosome gnomAD VCFs in
  `--pop-vcf-dir`. Reproducible, offline, unlimited scale.
- `--pop-source api` — the public [gnomAD GraphQL API](https://gnomad.broadinstitute.org/api),
  **no download**. Best for small cohorts (rate-limited to ~10 requests/min).

```bash
# No-download masking via the API (small cohorts):
vflank small run variants.maf -r GRCh37.fasta -g hg19 --pop-source api
```

Either source honours `--pop-data {genome,exome,both}` (default `genome`).
`both` masks a position if it is a common SNP in *either* the genome or exome
cohort. Flanks often fall in non-coding regions where only genomes have data,
so `genome` is the default.

Each variant yields two FASTA records:

```
>{SAMPLE}__{GENE}__{HGVSp}__{HGVSc}
{left_flank}[REF/ALT]{right_flank}
>Masked__{SAMPLE}__{GENE}__{HGVSp}__{HGVSc}
{left_flank_masked}[REF/ALT]{right_flank_masked}
```

Chromosome notation (`chr1` vs `1`) is auto-detected from the FASTA and VCFs.
The genome build is sanity-checked against the FASTA's chr1 length.

## Project layout

```
src/vflank/
├── core/   chrom · variant · flanks · popfreq   (pure, testable domain logic)
├── io/     maf · reference · fasta              (file access)
└── cli/    app · small                          (Typer commands)
```

## Documentation

- [docs/DEVELOPER.md](docs/DEVELOPER.md) — setup, running, testing, using vflank
  as a library, and extending it (new flank sources, CLI commands).
- [docs/ARCHITECTURE.md](docs/ARCHITECTURE.md) — design, scope boundary, and the
  milestone roadmap.
- `CLAUDE.md` — repository conventions and the quality gate.
