Metadata-Version: 2.4
Name: genomatch
Version: 0.4.2
Summary: A Python toolkit for auditable harmonization, liftover, intersection, and projection of genetic variant tables and payloads.
Author: Precimed
License-Expression: MIT
Requires-Python: >=3.12
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: pandas
Requires-Dist: pgenlib
Requires-Dist: pysam
Requires-Dist: PyYAML
Requires-Dist: scipy
Dynamic: license-file

# Variant Table Toolkit

This toolkit harmonizes genetic variant data across common research formats and reference assemblies. It supports `GRCh37`, `GRCh38`, and `T2T-CHM13v2.0`, chromosomes `1-22`, `X`, `Y`, and `MT`, common contig naming modes (`ncbi`, `ucsc`, `plink`), and biallelic variants, including SNPs and supported indels.

The common workflow rewrites `chr` / `bp` / `a1` / `a2` / `snp` fields in BFILE, PFILE, VCF, or summary-statistics inputs into a standardized variant key while adjusting the attached data, such as genotypes or summary-statistic columns, accordingly. Users request the target build, contig naming, and optional filtering or normalization flags; the pipeline handles build guessing in the source data, liftover between builds (if needed), allele swaps, reference-anchored allele ordering, sorting, and duplicate removal. The workflow is split into preparation and projection phases, so users can save and reuse a prepared variant set, or project only a user-defined subset of source variants.

## Start here

1. Install the runtime using one of the supported paths in [docs/install.md](docs/install.md).
2. Download the reference FASTA/chain assets and configure `config.yaml` as described in [docs/downloads.md](docs/downloads.md).
3. Run through the worked example in [docs/tutorial-1.md](docs/tutorial-1.md).

## Documentation

- [Workflow](docs/workflow.md): the common prepare, combine, restrict, and project workflow.
- [Summary statistics](docs/sumstats.md): metadata, SNP-only imports with `--id-lookup`, projection, and clean projection.
- [Primitive tools and object model reference](docs/primitives.md): lower-level tools plus `.vmap`, `.vtable`, payloads, source-row mapping, object metadata, and allele ordering.

## Specifications

For exact schema and edge-case rules, see [SPEC.md](SPEC.md) and the detailed specs in [spec/](spec/). Wrapper behavior for `prepare_variants.py`, `prepare_variants_sharded.py`, and `project_payload.py` is defined in [spec/workflow.md](spec/workflow.md). Payload-application semantics are defined in [spec/payload-application.md](spec/payload-application.md).
