Metadata-Version: 2.4
Name: pyrsx
Version: 0.1.0
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: License :: OSI Approved :: GNU General Public License v3 or later (GPLv3+)
Classifier: Programming Language :: Rust
Classifier: Programming Language :: Python :: 3
Requires-Dist: click>=8
Summary: Python bindings for rsx: high-performance RAD-seq sex determination toolkit
Author-email: Rohit Goswami <rgoswami@ieee.org>
License: GPL-3.0-or-later
Requires-Python: >=3.9
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Documentation, https://github.com/HaoZeke/rsx-rs/tree/main/docs
Project-URL: Homepage, https://github.com/HaoZeke/rsx-rs
Project-URL: Repository, https://github.com/HaoZeke/rsx-rs

# pyrsx

Python bindings for [rsx](https://github.com/HaoZeke/rsx-rs): a high-performance streaming toolkit for RAD-seq sex determination.

## Installation

```bash
pip install pyrsx
```

## Usage

```python
import pyrsx

# Process FASTQ files into marker depth table
pyrsx.process("reads/", "markers.tsv", threads=4, min_depth=5)

# Compute distribution with Fisher's exact test + FDR
pyrsx.distrib("markers.tsv", "popmap.tsv", "distrib.tsv",
              test="fisher", correction="fdr")

# Extract significant markers with Bayesian output
pyrsx.signif("markers.tsv", "popmap.tsv", "signif.tsv",
             test="fisher", correction="fdr", bayes=True)

# Streaming PCA
pyrsx.pca("markers.tsv", "pca_results/", n_components=10)

# Merge tables (bounded memory, handles 75M+ sequences)
pyrsx.merge(["table1.tsv", "table2.tsv"], "merged.tsv")
```

## Features

- All rsx commands accessible from Python
- 2-5x faster than C++ RADSex
- Bounded-memory streaming for arbitrarily large datasets
- Multiple statistical tests: chi-squared, Fisher's exact, G-test
- Multiple corrections: Bonferroni, Benjamini-Hochberg FDR
- Bayesian sex-linkage classification (Bayes Factor + posterior)
- Streaming PCA via Tucker mode-2 decomposition
- K-mer based marker deduplication

