Metadata-Version: 2.4
Name: FedGWAS
Version: 0.3.2
Summary: Federated genome-wide association study pipeline built with Flower and PLINK
Project-URL: Homepage, https://github.com/sitaomin1994/FedGWAS_pipeline
Project-URL: Repository, https://github.com/sitaomin1994/FedGWAS_pipeline
Project-URL: Issues, https://github.com/sitaomin1994/FedGWAS_pipeline/issues
Project-URL: Documentation, https://github.com/sitaomin1994/FedGWAS_pipeline#readme
Author: idsla
License: MIT
License-File: LICENSE
Keywords: bioinformatics,federated-learning,flower,gwas,plink
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.11
Requires-Dist: flwr[simulation]<1.20,>=1.19.0
Requires-Dist: matplotlib>=3.10.9
Requires-Dist: mkdocs>=1.6.1
Requires-Dist: numpy>=1.21.0
Requires-Dist: pandas-plink>=2.3.1
Requires-Dist: pandas>=2.2.2
Requires-Dist: phe>=1.5.0
Requires-Dist: pycryptodomex>=3.19.0
Requires-Dist: pyplink>=1.3.7
Requires-Dist: pysnptools>=0.5.13
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: quarto-cli>=1.9.38
Requires-Dist: rich>=13.0.0
Requires-Dist: scipy>=1.9.0
Requires-Dist: typer>=0.12.0
Provides-Extra: dev
Requires-Dist: flake8-docstrings>=1.7.0; extra == 'dev'
Requires-Dist: flake8>=7.1.0; extra == 'dev'
Requires-Dist: isort>=5.13.2; extra == 'dev'
Requires-Dist: mypy>=1.10.1; extra == 'dev'
Requires-Dist: pre-commit>=3.7.1; extra == 'dev'
Requires-Dist: pylint>=3.2.5; extra == 'dev'
Requires-Dist: pytest>=8.2.2; extra == 'dev'
Requires-Dist: tox>=4.16.0; extra == 'dev'
Description-Content-Type: text/markdown

<p align="center">
  <img src="https://raw.githubusercontent.com/sitaomin1994/FedGWAS_pipeline/master/docs/images/logo-readme.png" alt="FedGWAS logo" width="640">
</p>

# FedGWAS: Federated Genome-Wide Association Study

[![PyPI](https://img.shields.io/pypi/v/FedGWAS.svg)](https://pypi.org/project/FedGWAS/)
[![Documentation](https://img.shields.io/badge/docs-GitHub%20Pages-2f8f83)](https://sitaomin1994.github.io/FedGWAS_pipeline/)
[![Deploy documentation](https://github.com/sitaomin1994/FedGWAS_pipeline/actions/workflows/deploy-docs.yml/badge.svg)](https://github.com/sitaomin1994/FedGWAS_pipeline/actions/workflows/deploy-docs.yml)
[![License](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)

## Overview

FedGWAS is a federated pipeline for Genome-Wide Association Studies (GWAS). It uses Flower for federated execution, PLINK for genetics tooling, and privacy-preserving relay protocols for multi-client GWAS workflows. Use FedGWAS when multiple sites need to run GWAS stages together while keeping local genotype data on each client.

<p align="center">
  <img src="https://raw.githubusercontent.com/sitaomin1994/FedGWAS_pipeline/master/docs/images/illustration_readme.png" alt="FedGWAS workflow illustration" width="100%">
</p>

To get started and learn how to use FedGWAS, use the following resources:

- Documentation site: [Documentation](https://sitaomin1994.github.io/FedGWAS_pipeline/)
- Examples gallery: [Examples](https://sitaomin1994.github.io/FedGWAS_pipeline/examples/overview)
- API reference: [API Reference](https://sitaomin1994.github.io/FedGWAS_pipeline/api-reference/api)
- Technical details: [Technical Details](https://sitaomin1994.github.io/FedGWAS_pipeline/techniques/overview)

## Prerequisites

FedGWAS requires Python 3.11 or later, Flower, and PLINK 1.9+.

Install [PLINK 1.9+](https://www.cog-genomics.org/plink/1.9/) and make sure `plink` is available on `PATH`.

```bash
plink --version
```

For repository-based runs, you can also set the PLINK path in each client config if your environment does not expose `plink` globally.

## FedGWAS Local Simulation Guide

Local simulation mode runs the full FedGWAS workflow on one machine by launching multiple simulated research centers and a federated server through Flower. Use it to validate an installation, create simulation experiments from preset settings and generated data, prototype center configs, and compare federated outputs against a centralized baseline without setting up a real federated deployment.

You can start local simulation in either of two ways:

1. Recommended: install from PyPI and use `fedgwas-sim` command line interface (CLI)
2. Repository/local workflow: clone this repository and run the old scripts directly

Both workflows require:

- Python 3.11 or later
- PLINK 1.9+ available on `PATH` or configured locally
- Flower installed through the package or local environment

### Recommended: PyPI CLI Workflow

Install the package:

```bash
python -m pip install FedGWAS
```

Verify that the simulation CLI is available:

```bash
fedgwas-sim --help
```

Create a standalone study directory and run the tiny two-client simulation:

```bash
mkdir my_study
cd my_study

# initialize study project directory
fedgwas-sim init
# setup data and configurations
fedgwas-sim setup-experiment syn-tiny --seed 42
# validation and run simulation
fedgwas-sim check
fedgwas-sim run --rounds 100
# evaluation and results collection
fedgwas-sim baseline generate --output data/centralized_baseline
fedgwas-sim evaluate results --baseline data/centralized_baseline --king
fedgwas-sim results collect --label tiny_run
```

The usage of the CLI can be found in the [documentation site](https://sitaomin1994.github.io/FedGWAS_pipeline/user-guide/cli-simulation).

### Repository/Local Script Workflow

Clone the repository if you want the old direct script workflow, bundled experiment files, cluster deployment scripts, documentation source, or developer tooling:

```bash
git clone https://github.com/sitaomin1994/FedGWAS_pipeline.git
cd FedGWAS_pipeline
python -m pip install -e .
```

With `uv`, you can install the local environment with:

```bash
git clone https://github.com/sitaomin1994/FedGWAS_pipeline.git
cd FedGWAS_pipeline
uv sync --python 3.11
```

Generate synthetic data:

```bash
python pipeline/simulation/simulated_data/generate_synthetic_data.py \
  --scale tiny \
  --partition-strategy even \
  --seed 42 \
  --output-dir experiments/correctness/tiny_even/data
```

Generate the centralized baseline:

```bash
python experiments/tools/generate_baseline.py \
  experiments/correctness/tiny_even/config.yaml
```

Run the federated simulation:

```bash
flwr run . local-simulation --stream
```

Or run with explicit release-smoke settings:

```bash
flwr run . local-simulation --stream --run-config \
  'simulation=true num-server-rounds=100 config_path="experiments/correctness/tiny_even/configs"'
```

Evaluate the run:

```bash
python experiments/tools/evaluation/evaluate_all.py \
  experiments/correctness/tiny_even/results_2 \
  --baseline experiments/correctness/tiny_even/data/tiny/centralized_baseline \
  --king
```

If you changed the active config output paths, pass the results directory from those config files instead.

### Example Simulation Experiments

We have a few preset experiments with generated data and configs in the repository for testing and demonstration. You can found the details of these example experiments in the documentation and the experiment directories:

- Tiny experiment details: [experiments/correctness/tiny_even/README.md](experiments/correctness/tiny_even/README.md)

- Small experiment details: [experiments/performance/small_even/README.md](experiments/performance/small_even/README.md)

- real world data experiment details: [experiments/real_world/real_world/README.md](experiments/real_world/real_world/README.md)

## FedGWAS Cluster Deployment Guide

Instead of running the pipeline in local simulation mode, you can deploy the federated server and clients on separate machines or containers. The cluster deployment guide walks through the steps to set up a real federated deployment with the current FedGWAS implementation. It also includes tips for debugging and troubleshooting common issues.

## Repository Guides

- Cluster deployment guide: [cluster_deployment/docs/CLUSTER_USER_GUIDE.md](cluster_deployment/docs/CLUSTER_USER_GUIDE.md)

## Federated Protocol Summary

FedGWAS runs a stage-based federated workflow:

1. Key exchange
2. Encrypted seed synchronization
3. Local and global QC
4. Iterative KING kinship analysis
5. Local logistic regression filtering
6. Iterative logistic regression
7. Result retention and evaluation

The server relays encrypted client-to-client payloads for selected stages and does not decrypt those payloads. See [CURRENT_VERSION.md](CURRENT_VERSION.md) for the current privacy model, stage contracts, and limitations.

## Troubleshooting (Common Issues)

- `plink` not found: install PLINK 1.9+ and make sure it is on `PATH`.
- Flower uses the wrong config: pass `--run-config 'config_path="..."'`.
- Empty or missing results: generate the tiny synthetic data and baseline before running.
- TestPyPI or PyPI install fails for a new release: check that the version in `pyproject.toml` has been published and that dependency resolution can reach normal PyPI.

## License

FedGWAS is distributed under the MIT License. See [LICENSE](LICENSE).

## Contributors and Creator

Developed by [Rutgers Institute in Data Science, Learning, and Application](https://sites.rutgers.edu/idsla/).

**Contributors**:

- Dr. Xinyue Wang <a href="mailto:link_to_be_added" aria-label="Email Dr. Xinyue Wang">&#9993;</a>
- Dr. Sitao Min <a href="mailto:link_to_be_added" aria-label="Email Dr. Sitao Min">&#9993;</a>
