Metadata-Version: 2.4
Name: simphyni
Version: 0.1.3
Summary: SimPhyni: a tool for phylogenetic trait simulation and inference.
Author-email: Ishaq Balogun <ishaqobalogun@gmail.com>
License: MIT
Requires-Python: <3.13,>=3.11
Description-Content-Type: text/markdown
Requires-Dist: ete3>=3.1.3
Requires-Dist: joblib>=1.4.2
Requires-Dist: matplotlib>=3.10.1
Requires-Dist: numpy>=2.2.3
Requires-Dist: pandas>=2.2.3
Requires-Dist: plotly>=6.0.0
Requires-Dist: scikit-learn>=1.6.1
Requires-Dist: seaborn<0.14,>=0.13.0
Requires-Dist: setuptools>=75.8.0
Requires-Dist: statsmodels>=0.14.4
Requires-Dist: wheel>=0.45.1
Requires-Dist: annotated-types>=0.7.0
Requires-Dist: biopython>=1.85
Requires-Dist: certifi>=2025.1.31
Requires-Dist: charset-normalizer>=3.4.1
Requires-Dist: idna>=3.10
Requires-Dist: itolapi>=4.1.5
Requires-Dist: jinja2>=3.1.6
Requires-Dist: markupsafe>=3.0.2
Requires-Dist: pastml>=1.9.50
Requires-Dist: pydantic>=2.10.6
Requires-Dist: pydantic-core>=2.27.2
Requires-Dist: requests>=2.32.3
Requires-Dist: scipy>=1.14.0
Requires-Dist: urllib3>=2.4.0
Requires-Dist: snakemake>=9.10

# SimPhyNI

## Overview

**SimPhyNI** (Simulation-based Phylogenetic iNteraction Inference) is a phylogenetically-aware framework for detecting evolutionary associations between binary traits (e.g., gene presence/absence, major/minor alleles, binary phenotypes) on microbial phylogenetic trees. This tool leverages phylogenetic infromation to correct for surious associations caused by the relatedness of sister taxa. 

This pipeline is designed to:

* Infer evolutionary parameters for traits (gain/loss rates, time to emergence, ancestral states)
* Estimate trait co-occurence null models through independent simulation of traits
* Output statistical results for associations 

---

## Getting Started

### Installation

First create a new environment:

```bash
conda create -n simphyni python=3.11
conda activate simphyni
```

then install using using PyPI

```bash
pip install simphyni
```

test installation:

```bash
simphyni version
```

---

## Usage

### Run mode (single-run)

```bash
simphyni run \
  --sample-name my_sample \
  --tree path/to/tree.nwk \
  --traits path/to/traits.csv \
  --run-traits 0,1,2 \
  --outdir my_analysis \
  --cores 4 \
  --temp_dir ./tmp \
  --min_prev 0.05 \
  --max_prev 0.95 \
  --prefilter \
  --plot
```

* --run-traits specifies a comma-separated list of column indices (0-indexed) in the traits CSV for “trait against all” comparisons. Use 'ALL' (default) to include all traits.


### Run mode (batch)

Create a `samples.csv` file:

```csv
Sample,Tree,Traits,run_traits,MinPrev,MaxPrev
run1,tree1.nwk,traits1.csv,All,0.05,0.95
run2,tree2.nwk,traits2.csv,"0,1,2",0.05,0.90
```
* run_traits, MinPrev, and MaxPrev are optional columns that will use default values if not provided

Then execute:

```bash
simphyni run --samples samples.csv --cores 16
```

### Run with SLURM on HPC

First, download example cluster scripts:
```bash
simphyni download-cluster-scripts
```

Edit cluster config files for your hpc then run simphyni with the --slurm flag:
```bash
simphyni run --samples samples.csv --slurm
```

This will automatically use the downloaded cluster configuration files (cluster.args and cluster.slurm.json) to schedule jobs via SLURM.
*HPC mode is useful for laarge batch jobs to parralelize execution across multiple compute nodes

For all run options:

```bash
simphyni run --help
```

## Example data

Download and run example inputs using:
```bash
simphyni download-examples
simphyni run --samples example_inputs/simphyni_sample_info.csv --cores 8 --prefilter --plot
```
---

## Outputs

Outputs for each sample are placed in structured folders in the working directory or specified output directory in subdirectories by sample name, including:

* `simphyni_result.csv` contianing all tested trait pairs with their infered interaction direction, p-value, and effect size
* `simphyni_object.pkl` optional file containing the completed analysis, parsable with an active SimPhyNI evironment. Contorlled with the --save-object flag (not recommended for large analyses, > 1,000,000 comparisons)
* heatmap summaries of tested associations if --plot is enabled

---

### Directory Structure

```
SimPhyNI/
├── simphyni/               # Core package
│   ├── Simulation/         # Simulation scripts
│   ├── scripts/            # Workflow scripts
│   └── envs/simphyni.yaml  # Conda environment (used in snakemake)
├── conda-recipe/           # Build recipe 
├── cluster_scripts         # Cluster configs for SLURM
├── example_inputs 
└── pyproject.toml
```

---


## Contact

For questions, please open an issue or contact Ishaq Balogun at https://github.com/jpeyemi.
