Metadata-Version: 2.4
Name: targetsage
Version: 0.3.0
Summary: TargetSage: computational pipeline for CRISPR/RNAi screening analysis
Author: TargetSage Contributors
License: MIT
Project-URL: Homepage, https://pypi.org/project/targetsage/
Project-URL: Documentation, https://pypi.org/project/targetsage/
Project-URL: Repository, https://github.com/targetsage/targetsage
Project-URL: Issues, https://github.com/targetsage/targetsage/issues
Keywords: crispr,rnai,screen,genetics,bioinformatics,high-throughput,hts,normalization,hit-calling,enrichment,gsea
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Operating System :: OS Independent
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy
Requires-Dist: openpyxl
Requires-Dist: pandas
Requires-Dist: pyarrow
Requires-Dist: PyYAML
Requires-Dist: pyreadr
Requires-Dist: scipy
Requires-Dist: statsmodels
Requires-Dist: scikit-learn
Requires-Dist: matplotlib
Requires-Dist: matplotlib-venn
Requires-Dist: gseapy
Requires-Dist: networkx
Requires-Dist: requests
Requires-Dist: seaborn
Requires-Dist: psycopg2-binary
Requires-Dist: fpdf
Provides-Extra: notebooks
Requires-Dist: jupyter; extra == "notebooks"
Requires-Dist: ipython; extra == "notebooks"
Provides-Extra: plots
Requires-Dist: plotly; extra == "plots"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: jupyter; extra == "dev"
Requires-Dist: ipython; extra == "dev"
Dynamic: license-file

# TargetSage

> Computational pipeline for CRISPR/RNAi screening analysis — arrayed and pooled.

[![PyPI version](https://badge.fury.io/py/targetsage.svg)](https://pypi.org/project/targetsage/)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**TargetSage** is a Python library and command-line toolkit for analyzing high-throughput CRISPR and RNAi screening data. It supports both:

- **Arrayed screens** — well-level readouts from plate readers (384/96-well format)
- **Pooled screens** — guide-level readouts from sequencing (dropout/enrichment)

The pipeline covers the full analysis workflow: data validation → normalization → QC → hit calling → gene aggregation → pathway enrichment → visualization.

---

## 📦 Installation

```bash
pip install targetsage
```

**Requirements:** Python ≥3.11

Optional dependencies for full functionality:

```bash
pip install targetsage[notebooks]   # Jupyter support
pip install targetsage[plots]       # Enhanced plotting backends
```

---

## 🚀 Quick Start

### Arrayed Screen — One Function

```python
import pandas as pd
from targetsage.stats.batch_correction import run_arrayed_analysis

df = pd.read_csv("my_arrayed_screen.csv")

results = run_arrayed_analysis(
    df,
    rep1_col="Raw_rep1",
    rep2_col="Raw_rep2",
    plate_col="Plate",
    control_type_col="SpotType",
    norm_method="genes and all Non-targeting",
    p_cutoff=0.05,
    lfc_cutoff=1.0,
)

# results is a DataFrame with well-level hits, fold-changes, SSMD, p-values
hits = results[results["is_hit"] == True]
print(f"Found {len(hits)} hits")
```

### Arrayed Screen — Step-by-Step Workflow

```python
from targetsage.pipeline.arrayed_workflow import ArrayedScreenWorkflow

wf = ArrayedScreenWorkflow(
    df,
    screen_name="my_screen",
    rep_cols=["Raw_rep1", "Raw_rep2"],
    plate_col="Plate",
    well_col="Well",
    gene_col="gene_symbol",
    norm_method="genes and all Non-targeting",
    gene_hit_method="moderated_t",
    p_cutoff=0.05,
    lfc_cutoff=1.0,
)

# Run the full pipeline
wf.run_all()

# Access results
print(wf.results_summary())

# Gene-level hits
gene_hits = wf.get_step("hit_calling")
```

### Pooled Screen

```python
from targetsage.pipeline.pooled_intensity_workflow import PooledIntensityWorkflow

wf = PooledIntensityWorkflow(
    df,
    screen_name="my_pooled_screen",
    reference_cols=["Baseline_R1", "Baseline_R2", "Baseline_R3"],
    treatment_cols=["Treatment_R1", "Treatment_R2", "Treatment_R3"],
    gene_col="gene_symbol",
    p_cutoff=0.05,
    log2fc_cutoff=0.3,
)

wf.run_all()
```

---

## 🖥️ Command Line Interface

Three entry points are installed:

```bash
# Main dispatcher
targetsage array <command> [options]
targetsage pool  <command> [options]

# Direct aliases
targetsage-array <command> [options]
targetsage-pool  <command> [options]
```

### Arrayed CLI Example

```bash
targetsage array run-all \
    -i data/my_screen.csv \
    -o results/arrayed_out \
    --rep1-col Raw_rep1 \
    --rep2-col Raw_rep2 \
    --plate-col Plate \
    --gene-col gene_symbol \
    --norm-method "genes and all Non-targeting" \
    --gene-hit-method moderated_t
```

### Pooled CLI Example

```bash
targetsage pool run-all \
    -i data/my_pooled_screen.csv \
    -o results/pooled_out \
    --reference-cols Baseline_R1 Baseline_R2 Baseline_R3 \
    --treatment-cols Treatment_R1 Treatment_R2 Treatment_R3 \
    --gene-col gene_symbol \
    --pvalue-method welch
```

Get help for any command:

```bash
targetsage array --help
targetsage array run-all --help
```

---

## 📊 Analysis Methods

### Normalization

| Method | Description | Best For |
|--------|-------------|----------|
| `genes and all Non-targeting` | Gene wells + all NTC wells as baseline | Standard screens |
| `genes and own negative controls` | Per-gene own NTC scaling | Screens with matched controls |
| `all negative controls` | NTC-only baseline | Small screens |
| `B-score` | Median polish (row/column correction) | Low hit-rate plates (<20%) |
| `LOESS` | Local regression spatial correction | High hit-rate plates |
| `Z-score plate` | Per-plate Z-score | Quick normalization |

### Hit Calling (Arrayed)

| Method | Description |
|--------|-------------|
| `moderated_t` | Moderated t-test with shrinkage (limma-style) |
| `t_test` | Standard Welch's t-test vs NTC |
| `mann_whitney` | Non-parametric Mann-Whitney U |
| `rank_product` | Rank product for replicate concordance |
| `rsa` | Redundant siRNA Analysis (Konig-style) |
| `lme` | **Linear Mixed Effects** — random plate intercepts for multi-plate designs |

### Hit Calling (Pooled)

| Method | Description |
|--------|-------------|
| `RRA` | Robust Rank Aggregation (MAGeCK-style) |
| `NB GLM` | Negative Binomial generalized linear model |

### QC Metrics

- **SSMD** — Strictly Standardized Mean Difference (well-level effect size)
- **Z′-factor** — Plate quality score
- **Replicate correlation** — Pearson r between replicates (configurable threshold)
- **Hit-rate estimation** — Per-plate hit-rate for B-score guardrails

### Pathway Enrichment

- Over-Representation Analysis (ORA) via gseapy/Enrichr
- Custom GMT gene-set support
- Cached offline mode for reproducibility

---

## 📓 Notebooks & Examples

Example notebooks are available in the `notebooks/` directory:

| Notebook | Description |
|----------|-------------|
| `targetsage_package_example.ipynb` | Quick-start package API walkthrough |
| `arrayed_screen_walkthrough.ipynb` | Step-by-step arrayed screen analysis |
| `arrayed_screen_walkthrough_executed.ipynb` | Same with executed outputs |
| `tss_crispri_384_pipeline.ipynb` | Real 384-well CRISPRi example |
| `rnaither_drosophila_walkthrough.ipynb` | RNAi screen example (Drosophila) |
| `targetsage_crispra_array_step_by_step.ipynb` | CRISPRa activation screen |
| `arrayed_screen_data_simulator.ipynb` | Generate synthetic test data |

Run locally:

```bash
git clone https://github.com/your-org/targetsage.git
cd targetsage
pip install -e ".[notebooks]"
jupyter notebook notebooks/
```

---

## 🧪 Testing

```bash
# Clone the repository
git clone https://github.com/your-org/targetsage.git
cd targetsage

# Install in development mode
pip install -e ".[dev]"

# Run the test suite
pytest
```

---

## 🏗️ Architecture

```
targetsage/
├── data/              # Data loaders, schema, validation
├── normalization/     # Normalization methods (B-score, LOESS, Z-score, etc.)
├── hits/              # Hit scoring (well-level + gene-level aggregation)
├── stats/             # Statistical tests, QC, batch correction
├── pipeline/          # Workflow runners (ArrayedScreenWorkflow, PooledIntensityWorkflow)
│   ├── steps/         # Individual step implementations
│   ├── config.py      # Configuration dataclasses
│   ├── arrayed_workflow.py
│   └── pooled_intensity_workflow.py
├── qc/                # QC engine and report generation
├── utils/             # DataFrame helpers, well coordinates, etc.
├── visualization/     # Plotting utilities
├── network/           # Network analysis and visualization
└── cli.py             # Command-line entry points
```

---

## 🔬 Citation

If you use TargetSage in your research, please cite:

> TargetSage: A computational pipeline for CRISPR/RNAi screening analysis.
> *Package version X.Y.Z*, https://pypi.org/project/targetsage/

Key methods implemented in TargetSage are based on established literature:

- **B-score normalization**: Brideau et al., *J Biomol Screen* 2003
- **LOESS normalization**: Cleveland et al., *J Am Stat Assoc* 1979
- **LME for arrayed CRISPR**: *PLOS ONE* 2024 (simulation-guided method selection)
- **SSMD**: Zhang et al., *J Biomol Screen* 2007
- **RRA**: Kolde et al., *Bioinformatics* 2012

---

## 📄 License

MIT License — see [LICENSE](LICENSE) file.

---

## 🤝 Contributing

Contributions are welcome! Please open an issue or pull request on GitHub.

For the full-stack web application (TargetSage Cloud), see the separate documentation.
