Metadata-Version: 2.4
Name: simphyni
Version: 1.0.2
Summary: SimPhyni: a tool for phylogenetic trait simulation and inference.
Author-email: Ishaq Balogun <ishaqobalogun@gmail.com>
License: MIT
Requires-Python: <3.13,>=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: ete3>=3.1.3
Requires-Dist: joblib>=1.4.2
Requires-Dist: matplotlib>=3.10.1
Requires-Dist: numpy>=2.2.3
Requires-Dist: numba>=0.62.0
Requires-Dist: pandas>=2.2.3
Requires-Dist: scikit-learn>=1.6.1
Requires-Dist: seaborn<0.14,>=0.13.0
Requires-Dist: setuptools>=75.8.0
Requires-Dist: statsmodels>=0.14.4
Requires-Dist: wheel>=0.45.1
Requires-Dist: annotated-types>=0.7.0
Requires-Dist: biopython>=1.85
Requires-Dist: certifi>=2025.1.31
Requires-Dist: charset-normalizer>=3.4.1
Requires-Dist: idna>=3.10
Requires-Dist: itolapi>=4.1.5
Requires-Dist: jinja2>=3.1.6
Requires-Dist: markupsafe>=3.0.2
Requires-Dist: pastml>=1.9.50
Requires-Dist: pydantic>=2.10.6
Requires-Dist: pydantic-core>=2.27.2
Requires-Dist: requests>=2.32.3
Requires-Dist: scipy>=1.14.0
Requires-Dist: urllib3>=2.4.0
Requires-Dist: snakemake>=9.10
Dynamic: license-file

# SimPhyNI

## Overview

**SimPhyNI** (Simulation-based Phylogenetic iNteraction Inference) is a phylogenetically-aware framework for detecting evolutionary associations between binary traits (e.g., gene presence/absence, major/minor alleles, binary phenotypes) on microbial phylogenetic trees. This tool leverages phylogenetic information to correct for spurious associations caused by the relatedness of sister taxa. 

This pipeline is designed to:

* Infer evolutionary parameters for traits (gain/loss rates, time to emergence, ancestral states)
* Estimate trait co-occurence null models through independent simulation of traits
* Output statistical results for associations 

---

## Getting Started

### Installation

First, ensure bioconda and conda-forge are channels are configured:

```bash
conda config --add channels conda-forge
conda config --add channels bioconda
```

Create a new environment:

```bash
conda create -n simphyni
conda activate simphyni
```

then install SimPhyNI from bioconda:

```bash
conda install simphyni
```

test installation:

```bash
simphyni version
```

### Input Specifications

**1. Phylogenetic Tree (`.nwk`)**

* Standard Newick format.
* Must be **rooted** (both outgroup and midpoint are acceptable).
* Tip labels must match the `Sample` column in your traits file.
* Branch lengths are required for accurate rate estimation.

**2. Traits File (`.csv`)**

* **Rows:** Genomes/Samples (matching tree tips).
* **Columns:** Binary traits (0 = Absent, 1 = Present; non numerical values will be st to 1 and blank values will be set to 0).
* **Header:** Required (Trait names).
* **Index:** The first column must contain sample names.

*Example `traits.csv`:*

```csv
Sample,PhenotypeX,GeneA,GeneB
E_coli_1,1,0,1
E_coli_2,1,1,0
E_coli_3,0,0,1

```

---



## Usage

### Run mode (single-run)

```bash
simphyni run \
  --sample-name my_sample \
  --tree path/to/tree.nwk \
  --traits path/to/traits.csv \
  --run-traits 0,1,2 \
  --outdir my_analysis \
  --cores 4 \
  --temp_dir ./tmp \
  --min_prev 0.05 \
  --max_prev 0.95 \
  --plot
```

* `--run-traits` specifies a comma-separated list of column indices (0-indexed) in the traits CSV for “trait against all” comparisons. Use 'ALL' (default) to include all traits.


### Run mode (batch)

Create a `samples.csv` file:

```csv
Sample,Tree,Traits,run_traits,MinPrev,MaxPrev
run1,tree1.nwk,traits1.csv,All,0.05,0.95
run2,tree2.nwk,traits2.csv,"0,1,2",0.05,0.90
```

* `run_traits`, `MinPrev`, and `MaxPrev` are optional columns that will use default values if not provided.

Then execute:

```bash
simphyni run --samples samples.csv --cores 16
```

### Run with HPC

First, download example cluster scripts:
```bash
simphyni download-cluster-scripts
```

Edit cluster config file for your computing cluster then install the approprate snakemake executor from the avalible catalog: https://snakemake.github.io/snakemake-plugin-catalog/index.html (slurm shown below): 
```bash
pip install snakemake-executor-plugin-slurm
```

run simphyni with the --profile flag:
```bash
simphyni run --samples samples.csv --profile cluster_profile
```

For all run options:

```bash
simphyni run --help
```

## Example data

Download and run example inputs using:
```bash
simphyni download-examples
simphyni run --samples example_inputs/simphyni_sample_info.csv --cores 8 --plot
```
---

## Outputs

Outputs for each sample are placed in structured folders in the working directory or specified output directory in subdirectories by sample name, including:

### Main Result Files

**`simphyni_result.csv`**
Contains the statistical results for all tested trait pairs.

| Column | Description |
| --- | --- |
| `T1` / `T2` | Identifiers for the two traits being compared. |
| `direction` | Direction of association: `1` = Positive, `-1` = Negative. |
| `effect size` | Variance adjusted magnitude of the association. |
| `pval_naive` | Raw empirical P-value from the simulation. |
| `pval_bh` | P-value corrected using the Benjamini-Hochberg FDR method (recommended for phenotype-genotype tests). |
| `pval_by` | P-value corrected using the Benjamini-Yekutieli FDR method. (recommended for genotype-genotype tests)|
| `pval_bonf` | P-value corrected using the strict Bonferroni method. |
| `prevalence_T1` / `_T2` | Fraction of samples containing the trait (0.0 to 1.0). |

### Additional Outputs

* **`simphyni_object.pkl`**: Optional file containing the completed analysis object, parsable with an active SimPhyNI environment. Controlled with the `--save-object` flag (not recommended for large analyses > 1,000,000 comparisons).
* **Plots**: Heatmap summaries of tested associations (if `--plot` is enabled).

---

### Directory Structure

```
SimPhyNI/
├── simphyni/               # Core package
│   ├── Simulation/         # Simulation scripts
│   ├── scripts/            # Snakemake scripts
│   ├── Snakefile.py/       # Workflow build file
│   ├── simphyni_cli.py/    # Command line entry points
│   └── envs/simphyni.yaml  # Conda environment (used in snakemake)
├── test/                   # Testing suite
├── conda-recipe/           # Build recipe 
├── cluster_scripts         # Cluster configs for SLURM
├── example_inputs          # Example inputs to run SimPhyNI
└── pyproject.toml
```

---


## Contact

For questions, please open an issue or contact Ishaq Balogun at https://github.com/jpeyemi.
