Metadata-Version: 2.4
Name: pg_gpu
Version: 0.1.0
Summary: GPU-accelerated population genetics statistics
Project-URL: Homepage, https://github.com/kr-colab/pg_gpu
Project-URL: Documentation, https://pg-gpu.readthedocs.io
Project-URL: Repository, https://github.com/kr-colab/pg_gpu
Project-URL: Issues, https://github.com/kr-colab/pg_gpu/issues
Author-email: Andrew Kern <adkern@uoregon.edu>
Maintainer-email: Andrew Kern <adkern@uoregon.edu>
License-Expression: MIT
License-File: LICENSE
Keywords: CUDA,CuPy,GPU,bioinformatics,genomics,popgen,population genetics
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: GPU :: NVIDIA CUDA :: 12
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.12
Requires-Dist: bio2zarr[vcf]>=0.1
Requires-Dist: cupy-cuda12x[ctk]>=13.0
Requires-Dist: h5py>=3.0
Requires-Dist: kvikio-cu12>=25.0
Requires-Dist: matplotlib>=3.7
Requires-Dist: msprime>=1.0
Requires-Dist: numpy>=2.0
Requires-Dist: nvidia-nvcomp-cu12>=4.0
Requires-Dist: pandas>=2.0
Requires-Dist: scikit-allel>=1.3
Requires-Dist: scipy>=1.12
Requires-Dist: seaborn>=0.12
Requires-Dist: tqdm>=4.0
Requires-Dist: tskit>=0.5
Requires-Dist: zarr>=2.16
Provides-Extra: dev
Requires-Dist: ipykernel>=6.0; extra == 'dev'
Requires-Dist: ipython>=8.0; extra == 'dev'
Requires-Dist: pytest-xdist>=3.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Provides-Extra: docs
Requires-Dist: nbsphinx>=0.9.8; extra == 'docs'
Requires-Dist: sphinx-rtd-theme>=1.0; extra == 'docs'
Requires-Dist: sphinx>=4.0; extra == 'docs'
Provides-Extra: moments
Requires-Dist: demes; extra == 'moments'
Requires-Dist: demesdraw; extra == 'moments'
Requires-Dist: moments-popgen; extra == 'moments'
Description-Content-Type: text/markdown

# pg_gpu

GPU-accelerated population genetics statistics using CuPy.

[![Documentation Status](https://readthedocs.org/projects/pg-gpu/badge/?version=latest)](https://pg-gpu.readthedocs.io/en/latest/?badge=latest)

## Installation

pg_gpu requires a Linux x86_64 machine with an NVIDIA GPU and a CUDA 12 driver.
Nothing else is needed -- the full GPU runtime, including the CUDA toolkit
headers cupy uses to JIT-compile its kernels, is pulled from PyPI via the
`cupy-cuda12x[ctk]` dependency.

### With pixi (recommended)

The pinned, reproducible environment is managed with [pixi](https://pixi.sh)
and is the recommended way to install pg_gpu:

```bash
pixi install
pixi shell
```

### Into an existing conda / venv environment

To use pg_gpu from your own workflow (Snakemake, Jupyter, an existing conda
env), install it with pip:

```bash
pip install "git+https://github.com/kr-colab/pg_gpu"
```

This pulls the full runtime stack (cupy-cuda12x with toolkit headers, bio2zarr,
kvikio, nvcomp) as declared in `pyproject.toml`. For development against a local
checkout, use an editable install:

```bash
pip install -e ".[dev]"
```

## Quick Start

```python
from pg_gpu import HaplotypeMatrix, diversity, divergence, selection, sfs

# Load from VCF
hm = HaplotypeMatrix.from_vcf("data.vcf.gz", region="chr1:1-1000000")
hm.load_pop_file("populations.txt")

# Diversity
diversity.pi(hm, population="pop1")
diversity.tajimas_d(hm, population="pop1")

# Divergence
divergence.fst_hudson(hm, "pop1", "pop2")
divergence.dxy(hm, "pop1", "pop2")

# Selection scans
selection.ihs(hm)
selection.nsl(hm)

# Windowed statistics (fused CUDA kernels)
from pg_gpu import windowed_analysis
results = windowed_analysis(hm, statistics=["pi", "theta_w", "tajimas_d"],
                            window_size=50000)
```

## Documentation

Full documentation at [https://pg-gpu.readthedocs.io/](https://pg-gpu.readthedocs.io/).

Interactive walkthrough: [examples/pg_gpu_tour.ipynb](examples/pg_gpu_tour.ipynb).

## Statistics

| Category | Functions |
|----------|-----------|
| Diversity | `pi`, `theta_w`, `theta_h`, `theta_l`, `tajimas_d`, `fay_wus_h`, `normalized_fay_wus_h`, `zeng_e`, `zeng_dh`, `segregating_sites`, `singleton_count`, `haplotype_diversity`, `haplotype_count`, `heterozygosity_expected`, `heterozygosity_observed`, `inbreeding_coefficient`, `allele_frequency_spectrum`, `max_daf`, `daf_histogram`, `diplotype_frequency_spectrum`, `diversity_stats` |
| Divergence | `fst_hudson`, `fst_weir_cockerham`, `fst_nei`, `dxy`, `da`, `pbs`, `pairwise_fst` |
| Distance-based two-pop | `snn`, `dxy_min`, `gmin`, `dd`, `dd_rank`, `zx` |
| Distance moments | `pairwise_diffs`, `dist_var`, `dist_skew`, `dist_kurt`, `dist_moments` |
| Selection scans | `ihs`, `nsl`, `xpehh`, `xpnsl`, `garud_h`, `moving_garud_h`, `ehh_decay` |
| LD | `r`, `r_squared`, `dd` (LD), `dz`, `pi2`, `zns`, `omega`, `mu_ld` |
| SFS | `sfs`, `sfs_folded`, `sfs_scaled`, `sfs_folded_scaled`, `joint_sfs`, `joint_sfs_folded`, `joint_sfs_scaled`, `joint_sfs_folded_scaled`, `project_joint_sfs`, `fold_sfs`, `fold_joint_sfs` |
| Admixture / F-stats | `patterson_f2`, `patterson_f3`, `patterson_d`, `moving_patterson_f3`, `moving_patterson_d`, `average_patterson_f3`, `average_patterson_d` |
| Resampling | `block_jackknife`, `block_bootstrap` |
| Decomposition | `pca`, `randomized_pca`, `pairwise_distance`, `pcoa`, `local_pca`, `local_pca_jackknife`, `pc_dist`, `corners` |
| Relatedness | `grm`, `ibs` |
| Windowed pipeline | `windowed_analysis` — fused GPU windowing for any of the above |
| Biobank-scale streaming | `HaplotypeMatrix.from_zarr(streaming='always')` walks VCZ stores chunk by chunk; every per-window / SFS / LD / pairwise relatedness statistic dispatches transparently. See [tutorials/biobank_streaming](https://pg-gpu.readthedocs.io/en/latest/tutorials/biobank_streaming.html). |

## Development

```bash
pixi run pytest tests/
pixi run -e lint ruff check pg_gpu/
```
