Metadata-Version: 2.4
Name: epistasis-v2
Version: 1.1.1
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Rust
Classifier: License :: Public Domain
Classifier: Operating System :: OS Independent
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Dist: numpy>=1.23
Requires-Dist: pandas>=2.0
Requires-Dist: scipy>=1.11
Requires-Dist: scikit-learn>=1.3
Requires-Dist: lmfit>=1.2
Requires-Dist: emcee>=3.1
Requires-Dist: matplotlib>=3.7
Requires-Dist: gpmap-v2>=1.0.0
Summary: High-performance Python library for fitting high-order epistatic interactions in genotype-phenotype maps.
Keywords: epistasis,genotype-phenotype,genetics,genomics,bioinformatics
Author-email: Luis Perez <lperezmo@users.noreply.github.com>
License: Unlicense
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Homepage, https://github.com/lperezmo/epistasis-v2
Project-URL: Issues, https://github.com/lperezmo/epistasis-v2/issues
Project-URL: Repository, https://github.com/lperezmo/epistasis-v2

# epistasis-v2

[![CI](https://github.com/lperezmo/epistasis-v2/actions/workflows/ci.yml/badge.svg)](https://github.com/lperezmo/epistasis-v2/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/epistasis-v2.svg)](https://pypi.org/project/epistasis-v2/)
[![Python](https://img.shields.io/pypi/pyversions/epistasis-v2.svg)](https://pypi.org/project/epistasis-v2/)
[![License](https://img.shields.io/badge/license-Unlicense-blue.svg)](UNLICENSE)
[![Open in Streamlit](https://static.streamlit.io/badges/streamlit_badge_black_white.svg)](https://epistasis-v2.streamlit.app/)

High-performance Python library for fitting high-order epistatic interactions in genotype-phenotype maps. A clean-break rewrite of [harmslab/epistasis](https://github.com/harmslab/epistasis).

**Status: alpha.** Phase 1 port, Phase 2 Rust kernel, and the Phase 3 Walsh-Hadamard OLS fast path are all in. Sparse design matrices for high-order Lasso and remaining polish items are still to come.

A multi-page Streamlit showcase lives under [`examples/`](./examples/) and is published at [epistasis-v2.streamlit.app](https://epistasis-v2.streamlit.app/).

## What changed from v1

- Rust hot-path kernels via PyO3 (`epistasis._core`) instead of a shipped Cython `.c` blob.
- `uv` + `maturin` build. `pyproject.toml` only; no `setup.py`.
- Python 3.10 through 3.13. Older interpreters dropped.
- Type hints on the public API; `mypy --strict` in CI.
- Composition over `@use_sklearn` MRO injection. Concrete models hold an sklearn estimator as an attribute and forward calls explicitly, which unlocks modern sklearn (>=1.2) that broke the v1 trick when `normalize=` was removed.
- Walsh-Hadamard fast-path for Hadamard-encoded OLS fits: `O(n log n)` closed-form solve, no dense design matrix. Auto-engaged in `EpistasisLinearRegression.fit` when the attached GPM is a full-order biallelic library under global encoding; everything else falls back to the sklearn path.
- Sparse design matrix path for Lasso / ElasticNet at high order (pending; a memory concern at `L >= 20`).
- Coordinated rewrite of the [gpmap](https://github.com/harmslab/gpmap) dependency as [gpmap-v2](https://github.com/lperezmo/gpmap-v2). Consumes `binary_packed` (uint8 2D) and `encoding_table` with `site_index` instead of the deprecated `genotype_index`.
- No backward compatibility with v1. Pin the v1 package if you need that behavior.

## Repository layout

```
epistasis-v2/
├── pyproject.toml          uv + maturin build, ruff + mypy + pytest config
├── Cargo.toml              Rust workspace
├── python/epistasis/       Python source (installed as `epistasis`)
├── crates/epistasis-core/  Rust crate, exposed as `epistasis._core`
├── tests/                  pytest suite
├── benches/                pytest-benchmark suites (matrix kernels + FWHT)
├── docs/                   Sphinx docs (Phase 5)
├── .github/workflows/      CI (lint, test, matrix) + release (semantic-release, maturin wheels, PyPI OIDC)
├── CHANGELOG.md            generated by python-semantic-release
└── CONTRIBUTING.md         commit conventions, dev workflow
```

## Installation (dev)

Requires Python >= 3.10 and a Rust toolchain. `gpmap-v2` is pulled from PyPI.

```bash
uv sync
uv run maturin develop --release
uv run pytest
```

For lint and type-check:

```bash
uv run ruff check .
uv run ruff format --check .
uv run mypy python/epistasis
```

## Current progress

Phase 0 (scaffold), Phase 1 (port), Phase 2 (Rust kernels), and most of Phase 3 (FWHT fast path) are complete.

Ported modules:

- `epistasis.mapping` (sites, coefficients, `EpistasisMap`)
- `epistasis.matrix` (encoded vectors and design matrix; Rust-backed)
- `epistasis.exceptions` (`EpistasisError`, `XMatrixError`, `FittingError`)
- `epistasis.utils` (`genotypes_to_X`)
- `epistasis.models.base` (`AbstractEpistasisModel`, `EpistasisBaseModel`)
- `epistasis.models.linear` (`EpistasisLinearRegression` with analytic coefficient standard errors and a Walsh-Hadamard fast path for full-order biallelic fits, `EpistasisRidge`, `EpistasisLasso`, `EpistasisElasticNet`)
- `epistasis.models.nonlinear` (`EpistasisNonlinearRegression`, `FunctionMinimizer`; `power` and `spline` variants deferred)
- `epistasis.models.classifiers` (`EpistasisLogisticRegression`; LDA, QDA, Gaussian Process, and GMM deferred)
- `epistasis.simulate` (`simulate_linear_gpm`, `simulate_random_linear_gpm`)
- `epistasis.stats` (Pearson, R^2, RMSD, SS residuals, AIC, `split_gpm`)
- `epistasis.validate` (`k_fold`, `holdout`)
- `epistasis.sampling.bayesian` (`BayesianSampler` via emcee 3)
- `epistasis.fast` (`fwht_ols_coefficients`: closed-form OLS via FWHT)

Rust hot-path kernels in `epistasis._core`:

- `encode_vectors` (uint8 binary_packed to int8 Hadamard/local encoding)
- `build_model_matrix` (parallel site-product over genotype rows; flat ragged sites layout)
- `fwht` (iterative butterfly Fast Walsh-Hadamard Transform)

## Benchmarks vs v1

Measured on Windows 11 against `epistasis==0.7.5` + `gpmap==0.7.0`. Full biallelic
space (`AT` alphabet), `timeit` best-of-5. See [`benchmarks/vs_v1.py`](benchmarks/vs_v1.py)
for reproducible scripts and setup instructions.

> **Note on v1 times:** the Cython extension in `epistasis 0.7.5` requires MSVC to compile
> and produced no pre-built Windows wheel; times below use the pure-Python fallback, which
> is slower than actual v1+Cython. Even so, the FWHT fast path in v2 is orders of magnitude
> faster at full order.

### fit() order=1 (sklearn lstsq path in both versions)

| L | genotypes | v1 (ms) | v2 (ms) | speedup |
|---|-----------|---------|---------|---------|
| 8 | 256 | 12.98 | 1.81 | 7x |
| 10 | 1,024 | 44.07 | 2.02 | 22x |
| 12 | 4,096 | 183.13 | 2.61 | 70x |
| 14 | 16,384 | 807.37 | 5.08 | 159x |
| 16 | 65,536 | 3,771.14 | 19.35 | 195x |

### fit() full order (v1: dense lstsq, v2: FWHT O(N log N))

| L | genotypes | v1 (ms) | v2 (ms) | speedup |
|---|-----------|---------|---------|---------|
| 8 | 256 | 195.16 | 1.75 | 111x |
| 10 | 1,024 | 3,004.81 | 3.10 | 969x |
| 12 | 4,096 | 59,344.00 | 8.97 | >6,000x |
| 14 | 16,384 | (hours) | 35.50 | |
| 16 | 65,536 | (hours) | 154.15 | |

### Rust kernel vs NumPy reference (internal; release build, 16 threads; see `benches/`)

| kernel | input | Rust | NumPy reference | speedup |
|--------|-------|------|-----------------|---------|
| `build_model_matrix` | L=12, order=3 | 1.7 ms | 10.1 ms | ~6x |
| `build_model_matrix` | L=16, order=3 | 50 ms | 283 ms | ~5.7x |
| `encode_vectors` | L=16 (65k genotypes) | 1.06 ms | 3.24 ms | ~3x |
| `EpistasisLinearRegression.fit` | full-order L=10 | 0.78 ms | 292 ms (lstsq) | ~375x |
| `EpistasisLinearRegression.fit` | full-order L=12 | 3.4 ms | 15.4 s (lstsq) | ~4500x |

Pending:

- Sparse design matrix path for Lasso / ElasticNet (memory at L >= 20)
- `power.py` and `spline.py` nonlinear variants
- Remaining classifier implementations if demand surfaces
- ReadTheDocs build

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md). Commits follow [Conventional Commits](https://www.conventionalcommits.org/); releases and the changelog are automated by `python-semantic-release`.

## License

Unlicense (public domain). See [UNLICENSE](UNLICENSE).

