Metadata-Version: 2.4
Name: zebrachop
Version: 1.0.2
Summary: CRISPR gRNA design for zebrafish with Ensembl-aware filtering — Python 3 frontend wrapping the CHOPCHOP engine.
Author: Abhinav Bachu
License: MIT License
        
        Copyright (c) 2024 Abhinav Bachu
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
        
Project-URL: Homepage, https://github.com/abachu2005/ZebraCHOP
Project-URL: Repository, https://github.com/abachu2005/ZebraCHOP
Project-URL: Issues, https://github.com/abachu2005/ZebraCHOP/issues
Project-URL: Changelog, https://github.com/abachu2005/ZebraCHOP/blob/main/CHANGELOG.md
Keywords: bioinformatics,CRISPR,gRNA,zebrafish,chopchop,Cas9
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE
Requires-Dist: fastapi>=0.104
Requires-Dist: uvicorn[standard]>=0.24
Requires-Dist: python-multipart>=0.0.6
Requires-Dist: requests>=2.28
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"
Requires-Dist: ruff>=0.4; extra == "dev"
Requires-Dist: pre-commit>=3; extra == "dev"
Requires-Dist: httpx>=0.27; extra == "dev"
Dynamic: license-file

# ZebraCHOP

> CRISPR guide-RNA design for zebrafish (*Danio rerio*), with Ensembl-aware post-processing and a clean web UI.

ZebraCHOP is a zebrafish-focused fork of [CHOPCHOP](https://chopchop.cbu.uib.no/). It picks high-scoring Cas9/TALEN/Cpf1/Nickase guides for any zebrafish gene, scores them with the published efficiency models (Doench 2016, Xu 2015, …), maps off-targets with Bowtie, and post-processes the output via Ensembl REST to bias selection toward early exons (more likely to produce loss-of-function).

![status: ready](https://img.shields.io/badge/status-ready-brightgreen) ![CLI: Python 2.7](https://img.shields.io/badge/CLI-Python%202.7-orange) ![web: Python 3.9+](https://img.shields.io/badge/web-Python%203.9%2B-blue) ![license: MIT](https://img.shields.io/badge/license-MIT-lightgrey) [![DOI](https://zenodo.org/badge/853115972.svg)](https://zenodo.org/badge/latestdoi/853115972) [![CI](https://github.com/abachu2005/ZebraCHOP/actions/workflows/ci.yml/badge.svg)](https://github.com/abachu2005/ZebraCHOP/actions/workflows/ci.yml)

---

## What it does

```
   Gene name(s)
       │
       ▼
┌────────────────┐   ┌────────────────┐   ┌────────────────┐   ┌────────────────┐
│ Resolve via    │ → │ Extract        │ → │ Score guides   │ → │ Filter via     │
│ danRer11.gene  │   │ sequence with  │   │ + off-targets  │   │ Ensembl exon   │
│ _table         │   │ twoBitToFa     │   │ via Bowtie     │   │ structure      │
└────────────────┘   └────────────────┘   └────────────────┘   └────────────────┘
       │                                                              │
       └──────────────────────────► Per-gene .tsv (ranked) ◄──────────┘
```

Inputs: gene name(s) (or a genePred file). Outputs: one TSV per gene with ranked guides, off-target counts, GC%, self-complementarity, efficiency score, etc., plus an optional Ensembl-aware text summary.

---

## Quick start

```bash
git clone https://github.com/abachu2005/ZebraCHOP.git
cd ZebraCHOP
python3 bin/zebrachop-setup     # interactive: detects bowtie/twoBitToFa/primer3, writes config_local.json
bash frontend/run.sh            # open http://127.0.0.1:8000
```

If you prefer the CLI:

```bash
python2 chopchop_query.py --gene_names rx3,tbx16 -o results/ -- -G danRer11
python  reformat.py -input results/ -output ranked_summary.txt
```

### Why two Python versions?

The core CHOPCHOP engine (`chopchop.py`, `chopchop_query.py`, `featurization.py`) is Python 2.7 — it's a fork of upstream CHOPCHOP and depends on a frozen scikit-learn pickle (`Doench_2016_18.01_model_nopos.pickle`) that doesn't survive a clean Py3 port. The new **web frontend** and **setup wizard** are Python 3 and shell out to the Py2 engine as a subprocess.

The setup wizard records your Python-2 path under `config_local.json["PYTHON2"]` so you only have to set it once.

---

## External dependencies

The setup wizard checks for these and helps you install/configure them:

| Tool | Why |
|---|---|
| `bowtie` | Off-target alignment against the zebrafish genome |
| `twoBitToFa` (UCSC kent utils) | Extract genomic sequence around target |
| `primer3` (optional) | Primer design around the chosen guide |
| `danRer11.2bit` | UCSC two-bit genome file (~770 MB, auto-downloadable) |
| Bowtie index | Built from the zebrafish FASTA via `bowtie-build` |

You'll also want the included `genetable/danRer11.gene_table` (~4.8 MB) which the wizard already points at.

---

## Repository layout

```
.
├── chopchop.py                  # Py2: core guide-design engine
├── chopchop_query.py            # Py2: batch wrapper, one TSV per gene
├── reformat.py                  # Py3 OK: Ensembl REST exon-aware post-processor
├── featurization.py             # Py2: ML feature builder (Doench 2016)
├── config.json                  # default tool paths (empty; copy to config_local.json)
├── frontend/                    # NEW: Python-3 FastAPI web UI (subprocesses chopchop_query.py)
│   ├── main.py
│   ├── index.html
│   ├── requirements.txt
│   └── run.sh
├── bin/zebrachop-setup          # NEW: Python-3 interactive setup wizard
├── genetable/danRer11.gene_table
├── results/                     # example output TSVs (rx3, tbx16, tbxta)
├── ZebraCHOP Command Line Batch Analysis Pipeline.pdf
└── LICENSE
```

## Configuration (`config_local.json`)

The wizard writes this automatically. Manual example:

```json
{
  "PATH": {
    "BOWTIE": "/usr/local/bin/bowtie",
    "TWOBITTOFA": "/usr/local/bin/twoBitToFa",
    "PRIMER3": "/usr/local/bin/primer3_core",
    "TWOBIT_INDEX_DIR": "/data/zebrachop/indexes",
    "BOWTIE_INDEX_DIR":  "/data/zebrachop/indexes",
    "GENE_TABLE_INDEX_DIR": "./genetable"
  },
  "THREADS": 4,
  "PYTHON2": "/usr/local/bin/python2"
}
```

`config_local.json` is git-ignored.

## Web UI

Run `bash frontend/run.sh` and open <http://127.0.0.1:8000>. Paste one or more gene names, pick the scoring model, click **Design guides** — the frontend spawns a Python-2 subprocess per batch, streams a job log, and renders each gene's ranked TSV as a sortable table you can download.

## Citing

If you use ZebraCHOP, please cite the upstream CHOPCHOP paper (Labun *et al.*, NAR 2019). The poison-exon-aware Ensembl post-processing in `reformat.py` is an addition specific to this fork.

## License

MIT — see [`LICENSE`](LICENSE). Portions of the code derive from CHOPCHOP (MIT) and from Microsoft's Azimuth (BSD 3-Clause); see [`NOTICE`](NOTICE) and in-file headers for attributions.
