Metadata-Version: 2.4
Name: crisprdesigner2
Version: 0.1.0
Summary: CRISPR gRNA design pipeline for SpCas9 (NGG PAM)
Project-URL: Homepage, https://github.com/EngreitzLab/CRISPRDesigner2
Project-URL: Repository, https://github.com/EngreitzLab/CRISPRDesigner2.git
Project-URL: Issues, https://github.com/EngreitzLab/CRISPRDesigner2/issues
Author: Engreitz Lab
License: MIT
License-File: LICENSE
Keywords: bioinformatics,cas9,crispr,gene-editing,genomics,grna,guide-rna,spcas9
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.10
Requires-Dist: numpy>=1.20
Requires-Dist: pandas>=1.5
Requires-Dist: pyfastx>=0.8
Requires-Dist: rs3>=0.0.3
Provides-Extra: all
Requires-Dist: pysam>=0.20; extra == 'all'
Requires-Dist: pytest>=7.0; extra == 'all'
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == 'dev'
Provides-Extra: offtarget
Requires-Dist: pysam>=0.20; extra == 'offtarget'
Description-Content-Type: text/markdown

# CRISPRDesigner2

A CRISPR gRNA design tool for SpCas9 (NGG PAM) targeting hg38 and mm10 genomes. Available as both a **Python package** for programmatic use and a **Snakemake workflow** for batch processing.

## Overview

This tool designs CRISPR guide RNAs (gRNAs) for SpCas9 targeting NGG PAM sites:

1. **Extracts candidate gRNAs** from input genomic regions
2. **Filters guides** based on sequence quality criteria
3. **Scores on-target efficiency** using the RS3 model
4. **Scores off-target specificity** using GuideScan2
5. **Outputs** scored guides in standard formats

## Installation

### Option 1: Python package (recommended for integration)

Install as an editable package for use in other projects:

```bash
# Clone the repository
git clone https://github.com/EngreitzLab/CRISPRDesigner2.git
cd CRISPRDesigner2

# Create conda environment
conda env create -f workflow/envs/CRISPRDesigner2.yaml -n CRISPRDesigner2
conda activate CRISPRDesigner2

# Install as Python package
pip install -e .
```

### Option 2: Snakemake workflow only

If you only need the Snakemake workflow:

```bash
conda env create -f workflow/envs/CRISPRDesigner2.yaml -n CRISPRDesigner2
conda activate CRISPRDesigner2
```

### Data downloads

#### Genome FASTA

```bash
# hg38
wget https://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.fa.gz
gunzip hg38.fa.gz

# mm10
wget https://hgdownload.soe.ucsc.edu/goldenPath/mm10/bigZips/mm10.fa.gz
gunzip mm10.fa.gz
```

#### GuideScan2 index

Download pre-built indices from [guidescan.com/downloads](https://guidescan.com/downloads), then create a BAM index:

```bash
module load biology samtools/1.16.1
samtools index path/to/guidescan2.bam.sorted
```

#### RS3 model weights

No download required. The [`rs3` package](https://github.com/gpp-rnd/rs3) bundles model weights and is installed automatically.

## Usage

### As a Python package

```python
from crisprdesigner2 import (
    extract_guides_from_bed,
    apply_filters,
    get_passing_guides,
    score_guides_rs3,
    score_guides_guidescan,
    write_outputs,
)

# Extract guides from regions
guides = extract_guides_from_bed("regions.bed", "genome.fa")
import pandas as pd
guides_df = pd.DataFrame(guides)

# Filter by sequence quality
guides_df = apply_filters(guides_df)
passed = get_passing_guides(guides_df)

# Score on-target efficiency
scored = score_guides_rs3(passed, tracr="Chen2013")

# Score off-target specificity (optional)
scored = score_guides_guidescan(scored, "guidescan2.bam.sorted", "hg38")

# Write outputs
write_outputs(scored, "output_dir")
```

### Public API

```python
from crisprdesigner2 import (
    # Extraction
    extract_guides,              # Extract from single region
    extract_guides_from_bed,     # Extract from BED file
    reverse_complement,
    CONTEXT_PADDING,

    # Filtering
    apply_filters,               # Apply all sequence filters
    get_passing_guides,          # Get guides passing all filters
    is_valid_guide,              # Check single guide
    FILTERS,                     # Dict of filter functions

    # Scoring
    score_guides_rs3,            # RS3 on-target scoring (batch)
    score_single_guide_rs3,      # RS3 scoring (single guide)
    score_guides_guidescan,      # GuideScan2 off-target scoring

    # Output
    write_outputs,               # Write .txt and .bed files
    read_design_guides_txt,      # Read existing output

    # Caching
    load_predesigned_guides,     # Load cached guides
    merge_with_predesigned,      # Merge new + cached guides
)
```

### As a Snakemake workflow

Configure `config/config.yaml`:

```yaml
genome: "hg38"
regions: "path/to/regions.bed"
genome_fasta: "path/to/genome.fa"
guidescan2_index: "path/to/guidescan2.bam.sorted"
output_dir: "results/GuideDesign"
predesigned_guides: ""  # Optional: path to cached guides
```

Run the workflow:

```bash
# Dry run (validation)
snakemake -n --configfile config/config.yaml

# Full run
snakemake --configfile config/config.yaml --cores 4
```

## Output files

### designGuides.txt

Tab-separated file with scored guides:

| Column | Description |
|--------|-------------|
| `chr` | Chromosome |
| `start` | Start position (0-based) |
| `end` | End position (exclusive) |
| `locus` | Formatted locus string |
| `strand` | Strand (+ or -) |
| `GuideSequenceWithPAM` | 23-nt sequence (protospacer + PAM) |
| `guideSet` | Region name from input BED |
| `RS3_score` | On-target efficiency (log-odds, typically -2 to +2) |
| `specificity_score` | Off-target specificity (0-1, higher = better) |

### designGuides.bed

BED6 format for genome browser visualization.

## Interpreting scores

### RS3 on-target scores

RS3 scores are log-odds values, typically ranging from -2 to +2. Higher scores indicate better predicted on-target efficiency.

### GuideScan2 specificity scores

Specificity scores range from 0-1 (higher = more specific / fewer off-targets). The GuideScan2 paper recommends **specificity score > 0.2** as a cutoff ([Perez et al., Genome Biology 2025](https://link.springer.com/article/10.1186/s13059-025-03488-8)).

Guides with `NaN` specificity either:
- Are not in the reference genome
- Have multiple perfect matches in the genome

## Sequence filters

| Filter | Criterion | Reason |
|--------|-----------|--------|
| Pol III termination | >1 T in last 4 nt | Early transcription termination |
| High T/U content | >40% T | Reduced guide stability |
| T/U homopolymer | 4+ consecutive T | Pol III termination |
| Mononucleotide repeat | 5+ consecutive same base | Synthesis/sequencing issues |
| Low complexity | Various repeat patterns | Off-target concerns |
| Low GC | ≤20% GC | Unstable RNA structure |
| High GC | ≥90% GC | Synthesis issues |

## Project structure

```
CRISPRDesigner2/
├── pyproject.toml                      # Package metadata
├── src/
│   └── crisprdesigner2/                # Python package
│       ├── __init__.py                 # Public API
│       ├── extraction.py               # Guide extraction
│       ├── filters.py                  # Sequence filters
│       ├── scoring.py                  # Combined scoring exports
│       ├── _rs3_scoring.py             # RS3 implementation
│       ├── _guidescan_scoring.py       # GuideScan2 implementation
│       ├── output.py                   # Output file generation
│       └── cache.py                    # Pre-designed guide caching
├── Snakefile                           # Workflow definition
├── config/
│   └── config.yaml                     # Configuration template
├── workflow/
│   ├── envs/
│   │   └── CRISPRDesigner2.yaml        # Conda environment
│   └── scripts/
│       ├── snakemake_*.py              # Snakemake rule scripts
│       └── *.py                        # Thin wrappers (backward compat)
├── tests/
│   ├── data/                           # Test fixtures
│   └── test_*.py                       # Test modules
└── README.md
```

## Testing

```bash
conda activate CRISPRDesigner2

# Run all tests
python -m pytest tests/ -v

# Run specific module
python -m pytest tests/test_guide_extraction.py -v
```

## Troubleshooting

### Import errors

Ensure the package is installed:

```bash
conda activate CRISPRDesigner2
pip install -e .
```

### Missing GuideScan2

If GuideScan2 is unavailable, specificity scores will be `NaN`. The workflow still completes.

### Slow GuideScan2 queries

Ensure the BAM index exists:

```bash
ls path/to/guidescan2.bam.sorted.bai
# If missing:
samtools index path/to/guidescan2.bam.sorted
```

## License

See LICENSE file.

## Citation

If you use this tool, please cite:

- RS3: Doench et al., Nature Biotechnology, 2016
- GuideScan2: Perez et al., Genome Biology, 2025
