Metadata-Version: 2.4
Name: wtfdtb
Version: 0.3.1
Summary: High-Throughput Cross-Docking (NxM) — cross-dock massive ligand libraries against entire protein libraries via GNINA.
Author: Chandragupt Sharma
License-Expression: MIT
Keywords: bioinformatics,cheminformatics,docking,drug-discovery,target-fishing,virtual-screening
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: MacOS
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Chemistry
Requires-Python: >=3.10
Requires-Dist: biopython>=1.80
Requires-Dist: dimorphite-dl>=1.3
Requires-Dist: gemmi
Requires-Dist: meeko>=0.5
Requires-Dist: openmm>=8.0
Requires-Dist: pandas>=2.0
Requires-Dist: pdb-tools>=2.5
Requires-Dist: pdb2pqr>=3.6
Requires-Dist: pdbfixer>=1.9
Requires-Dist: prolif>=2.0
Requires-Dist: rdkit
Requires-Dist: requests
Requires-Dist: tqdm
Requires-Dist: typer>=0.9
Provides-Extra: dev
Requires-Dist: pytest-cov; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Description-Content-Type: text/markdown

# WTFDTB — High-Throughput Cross-Docking

> **NxM Screening**: Instantly cross-dock entire small-molecule libraries against libraries of macromolecular protein structures using a state-of-the-art ML/DL stack.

![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue)
![License: MIT](https://img.shields.io/badge/license-MIT-green)
![Status: Stable](https://img.shields.io/badge/status-v0.3.0-blue)

---

## What Is This?

Traditional virtual screening docks many ligands against one protein target. **WTFDTB is a High-Throughput Cross-Docking (NxM) engine**: it can dock **many ligands** against **one protein** to answer *"Which drug binds to this protein?"*, or it can dock **one ligand** against **many proteins** to answer *"What targets does this drug bind?"*, or it can dock **massive ligand libraries** against **multiple targets** simultaneously.

This **NxM High-Throughput Cross-Docking** capability is essential for:

- **High-Throughput Screening (HTS)** — discovering novel hits from massive vendor libraries
- **Drug repurposing** — finding new uses for existing drugs across multiple targets
- **Off-target prediction & Tox** — identifying potential side effects and cross-reactivity
- **Polypharmacology** — designing or understanding multi-target drug activity

WTFDTB automates the entire workflow from a raw ligand file to a ranked CSV of protein targets with interaction fingerprints — no manual intervention needed.

---

## Pipeline Architecture

The pipeline runs in 5 sequential phases:

```
  ┌──────────────┐    ┌────────────────────┐    ┌──────────────────┐
  │  1. Ligand   │───▶│  2. Receptor       │───▶│  3. Pocket       │
  │     Prep     │    │     Curation       │    │     Detection    │
  │              │    │     (parallel)      │    │                  │
  │ Dimorphite-DL│    │ PDBFixer + PDB2PQR │    │     P2Rank       │
  │ RDKit + Meeko│    │ + PROPKA + Meeko   │    │     (Java ML)    │
  └──────────────┘    └────────────────────┘    └──────────────────┘
                                                         │
         ┌───────────────────────────────────────────────┘
         ▼
  ┌──────────────────┐    ┌──────────────────────┐
  │  4. Docking      │───▶│  5. Post-Docking      │
  │     (parallel)   │    │     Analysis           │
  │                  │    │                        │
  │     GNINA        │    │ ProLIF + Pandas        │
  │  (CPU / GPU)     │    │ Filter → Rank → CSV   │
  └──────────────────┘    └──────────────────────┘
```

### Phase Details

| Phase | Module | Tools | What It Does |
|-------|--------|-------|--------------|
| **1. Ligand Prep** | `ligand_prep.py` | Dimorphite-DL, RDKit, Meeko | Parses massive multi-molecule `.mol2`/`.sdf` libraries, enumerates protonation states, generates 3D conformers, and retains a 2D SMILES template for zero-data-loss post-docking. |
| **2. Receptor Curation** | `receptor_curation.py` | PDBFixer, PDB2PQR, PROPKA, RDKit | Download PDB, repair missing heavy atoms, protonate at target pH. Includes **Intelligent Cofactor Recovery** (preserves whitelisted metals and cofactors). |
| **3. Pocket Detection** | `pocket_detection.py` | P2Rank (Java) | ML-based cavity prediction — no template bias, detects all druggable sites per protein. |
| **4. Docking** | `docking.py` | GNINA (C++) | CNN-rescored molecular docking for each pocket × ligand combination. Supports CPU and GPU acceleration. |
| **5. Post-Docking** | `post_dock.py` | ProLIF, Pandas | **Template-Based Molecule Reconstruction** rebuilds broken topologies for perfect interaction fingerprints. Filters, ranks, and exports NxM CSV matrices. |

---

## Installation

### From PyPI (Recommended)

```bash
pip install wtfdtb
```

---

## Setup (External Binaries)

WTFDTB requires **GNINA** and **P2Rank**. You can set them up automatically:

```bash
wtfdtb install
```

This command will:
1. Download pre-compiled Linux binaries for **both** GNINA-CPU and GNINA-CUDA (v1.3.2).
2. Download and extract P2Rank.
3. Place them in `~/.local/`.
4. **Automatically configure your PATH** by updating your `.bashrc`.

*Note: P2Rank requires Java ≥ 11 (`sudo apt install default-jre` on Ubuntu).*

---

## Quick Start

### 1. Create a Target List
Create a file named `targets.txt` with PDB IDs or paths to `.pdb` files:
```text
1IEP
1PXX
```

### 2. Run the Screen

**Standard (CPU):**
```bash
wtfdtb screen --ligand library.smi --targets targets.txt --output-dir my_results
```

**High Performance (GPU):**
```bash
# Requires NVIDIA GPU + CUDA 12
wtfdtb screen --ligand library.smi --targets targets.txt --output-dir my_results --gpu
```

---

## CLI Reference

```bash
wtfdtb screen [OPTIONS]
```

| Flag | Type | Default | Description |
|------|------|---------|-------------|
| `--ligand`, `-l` | Path | *required* | Input ligand file (`.sdf`, `.mol`, `.mol2`, `.smi`). Native support for multi-molecule libraries. |
| `--targets`, `-t` | Path | *required* | Directory of `.pdb` files or text file of PDB IDs |
| `--output-dir`, `-o` | Path | `results/` | Output directory for individual ranked CSVs and 2D Matrix Pivot Tables |
| `--ph` | float | `7.4` | Physiological pH for protonation |
| `--box-size` | int | `25` | Side length (Å) of the cubic docking box |
| `--cnn-model` | str | `default` | GNINA CNN model (`default`, `dense`) |
| `--cnn-score-threshold` | float | `0.5` | Minimum CNNscore (0–1) to accept a pose |
| `--min-interactions` | int | `1` | Minimum interactions to keep a pose |
| `--workers`, `-w` | int | CPU count | Parallel workers for curation and docking |
| `--exhaustiveness` | int | `8` | GNINA search thoroughness (higher = slower) |
| `--gpu` | bool | `False` | Enable GPU acceleration (requires CUDA version) |
| `--keep-hetatm` | str | `None` | Comma-separated 3-letter codes to preserve (e.g. 'SAM,LIG') |
| `--verbosity` | int | `1` | Logging: 0=quiet, 1=normal, 2=debug |

---

## Output Format

The output CSV is ranked primarily by **Vina affinity** (lower is better), with **CNNaffinity** used to break ties:

| Column | Description |
|--------|-------------|
| `rank` | Overall rank (1 = best binder) |
| `pdb_id` | Target protein PDB ID |
| `pocket` | Cavity name (from P2Rank) |
| `pose_rank` | Pose rank within this pocket (from GNINA) |
| `cnn_score` | Neural network confidence (0–1) |
| `cnn_affinity` | Predicted binding affinity (pKd) |
| `vina_affinity` | Empirical scoring affinity (kcal/mol) |
| `hbond` | Number of hydrogen bonds |
| `hydrophobic` | Number of hydrophobic contacts |
| `pi_stacking` | Number of π-stacking interactions |
| `salt_bridge` | Number of salt bridges |
| `total_interactions` | Sum of all interaction types |

---

## Python API

```python
from pathlib import Path
from wtfdtb.pipeline import run_pipeline

results_dir = run_pipeline(
    ligand_path=Path("multi-ligand-library.smi"),
    targets_path=Path("targets.txt"),
    output_dir=Path("my_results_folder"),
    use_gpu=True,
    workers=4
)
```

---

## Supported Platforms

| Platform | Status | Notes |
|----------|--------|-------|
| **Linux x86_64** | ✅ Supported | Primary platform. Binaries auto-installed via `wtfdtb install`. |
| **Windows (WSL)** | ✅ Supported | Works flawlessly via Windows Subsystem for Linux. |
| **Kaggle / Colab** | ✅ Supported | Verified working. GPU T4 acceleration supported. |
| **macOS** | ⚠️ Partial | Python pipeline works; GNINA must be compiled from source. |

---

## Citation

If you use WTFDTB in your research, please cite:

```bibtex
@software{wtfdtb2026,
  title  = {WTFDTB: High-Throughput Inverse Virtual Screening},
  author = {Chandragupt Sharma},
  year   = {2026},
  url    = {https://github.com/ChandraguptSharma07/WTFDTB}
}
```

---

## License

MIT — see [LICENSE](LICENSE) for details.
