Metadata-Version: 2.4
Name: pandadock
Version: 4.0.1
Summary: Molecular docking with SE(3)-equivariant GNN scoring - achieves R=0.88 on PDBbind
Home-page: https://github.com/pritampanda15/PandaDock
Author: Pritam Kumar Panda
Author-email: pritampanda15@gmail.com
License: MIT
Project-URL: Documentation, https://pandadock.readthedocs.io/
Project-URL: Bug Reports, https://github.com/pritampanda15/PandaDock/issues
Project-URL: Source, https://github.com/pritampanda15/PandaDock
Keywords: molecular-docking,drug-discovery,graph-neural-network,GNN,SE3-equivariant,binding-affinity,computational-chemistry,bioinformatics
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Chemistry
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: click>=8.0.0
Requires-Dist: biopython>=1.80
Requires-Dist: propka>=3.5.1
Requires-Dist: numpy>=1.21.0
Requires-Dist: scipy>=1.7.0
Requires-Dist: scikit-learn>=1.0.0
Requires-Dist: matplotlib>=3.5.0
Requires-Dist: seaborn>=0.11.0
Requires-Dist: plotly>=5.0.0
Provides-Extra: torch
Requires-Dist: torch>=2.0.0; extra == "torch"
Provides-Extra: ml
Requires-Dist: torch>=2.0.0; extra == "ml"
Requires-Dist: h5py>=3.7.0; extra == "ml"
Provides-Extra: gnn
Requires-Dist: torch>=2.0.0; extra == "gnn"
Requires-Dist: torch-geometric>=2.4.0; extra == "gnn"
Requires-Dist: torch-scatter; extra == "gnn"
Requires-Dist: torch-sparse; extra == "gnn"
Requires-Dist: pandas>=1.3.0; extra == "gnn"
Provides-Extra: conda
Requires-Dist: openmm>=8.0.0; extra == "conda"
Requires-Dist: pdbfixer>=1.9; extra == "conda"
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: black>=23.0.0; extra == "dev"
Requires-Dist: flake8>=6.0.0; extra == "dev"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license
Dynamic: license-file
Dynamic: project-url
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# PandaDock - Molecular Docking with GNN Scoring

---

<p align="center">
  <a href="https://github.com/pritampanda15/PandaDock">
    <img src="https://github.com/pritampanda15/PandaDock/blob/c4e5c9e91c4c8262faf1c7736850ca647d65adc1/PandaDock.png" width="500" alt="PandaDock Logo"/>
  </a>
</p>
<p align="center">
  <a href="https://pypi.org/project/pandadock/">
    <img src="https://img.shields.io/pypi/v/pandadock.svg" alt="PyPI Version">
  </a>
  <a href="https://github.com/pritampanda15/PandaDock/blob/main/LICENSE">
    <img src="https://img.shields.io/github/license/pritampanda15/PandaDock" alt="License">
  </a>
  <a href="https://github.com/pritampanda15/PandaDock/stargazers">
    <img src="https://img.shields.io/github/stars/pritampanda15/PandaDock?style=social" alt="GitHub Stars">
  </a>
  <a href="https://github.com/pritampanda15/PandaDock/issues">
    <img src="https://img.shields.io/github/issues/pritampanda15/PandaDock" alt="GitHub Issues">
  </a>
  <a href="https://github.com/pritampanda15/PandaDock/network/members">
    <img src="https://img.shields.io/github/forks/pritampanda15/PandaDock?style=social" alt="GitHub Forks">
  </a>
  <a href="https://pepy.tech/project/pandadock">
    <img src="https://static.pepy.tech/badge/pandadock" alt="Downloads">
  </a>
</p>
<p align="center">
  <a href="https://www.python.org/downloads/">
    <img src="https://img.shields.io/badge/python-3.8+-blue.svg" alt="Python 3.8+">
  </a>
  <a href="https://opensource.org/licenses/MIT">
    <img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT">
  </a>
  <a href="https://pandadock.readthedocs.io/">
    <img src="https://readthedocs.org/projects/pandadock/badge/?version=latest" alt="Documentation Status">
  </a>
</p>

---

**SE(3)-Equivariant GNN Scoring for Molecular Docking**

[Installation](#installation) | [Quick Start](#quick-start) | [Documentation](https://pandadock.readthedocs.io/) | [Benchmark](#benchmark-performance) | [Citation](#citation)

</div>

---

## Overview

**PandaDock v4.0** features a novel SE(3)-equivariant Graph Neural Network (GNN) scoring function that achieves state-of-the-art correlation with experimental binding affinities (R=0.88 on PDBbind, R=0.82 on ULVSH). The hybrid docking workflow combines traditional pose generation with GNN rescoring to deliver superior accuracy.

### Key Features

- **PandaDock-GNN**: SE(3)-equivariant scoring achieving **Pearson R = 0.88** on PDBbind
- **Hybrid Docking**: Combined pose generation + GNN rescoring (recommended workflow)
- **Universal Rescorer**: Rescore poses from ANY docking tool (Vina, Glide, GOLD, etc.)
- **Vina-Style Scoring**: AutoDock Vina empirical weights as default scoring
- **Multi-Task Learning**: Joint pKd/pEC50 regression + activity classification
- **Heterogeneous Graphs**: Separate protein/ligand node types with interaction edges
- **Specialized Modes**: Flexible, metal coordination, and tethered docking

---

## Benchmark Performance

### PDBbind v2020 Refined Set (5,316 complexes)

| Metric | Value |
|--------|-------|
| **Pearson R** | **0.88** |
| **Spearman R** | **0.88** |
| **RMSE** | 0.93 pK units |
| **MAE** | 0.68 pK units |
| Within 1.0 pK | 77.5% |
| Within 1.5 pK | 90.5% |

### ULVSH Dataset (942 compounds, 10 protein targets)

| Method | Type | Pearson R | N |
|--------|------|-----------|---|
| **PandaDock-GNN (test)** | **ML Scoring** | **0.82** | 95 |
| **PandaDock-GNN (full)** | **ML Scoring** | **0.67** | 942 |
| VM2 | ULVSH Baseline | 0.15 | 942 |
| PM6 | ULVSH Baseline | 0.08 | 939 |
| Hyde | ULVSH Baseline | 0.02 | 942 |
| Gnina | ULVSH Baseline | 0.01 | 941 |

**Key Results:**
- PandaDock-GNN achieves **R = 0.88** on PDBbind (5,316 complexes)
- **5.5x improvement** over the best baseline (VM2) on ULVSH
- Activity classification **AUC = 0.94** on ULVSH test set

---

## Installation

### Prerequisites

- Python 3.8 or higher
- Conda package manager (recommended for RDKit)

### Basic Installation

```bash
# Clone repository
git clone https://github.com/pritampanda15/PandaDock.git
cd PandaDock

# Create conda environment with RDKit
conda create -n pandadock python=3.10
conda activate pandadock
conda install -c conda-forge rdkit

# Install PandaDock
pip install -e .
```

### GNN Installation (Recommended)

```bash
# Install PyTorch and PyTorch Geometric for GNN support
pip install -e ".[gnn]"

# Or manually:
pip install torch torch-geometric torch-scatter torch-sparse
```

For detailed installation instructions, see [INSTALL.md](INSTALL.md).

---

## Quick Start

### Download Pre-trained Model (Recommended)

Get started immediately with the pre-trained model:

```bash
# Download the pre-trained model (~82 MB)
pandadock gnn download-model

# Model is saved to models/pandadock_gnn_v4.pt
```

### Hybrid Docking (Recommended)

The hybrid workflow combines traditional pose generation with GNN rescoring for best accuracy:

```bash
# Using pre-trained model
pandadock hybrid -r protein.pdb -l ligand.sdf \
                 --center 10 20 30 --box 20 20 20 \
                 -m models/pandadock_gnn_v4.pt \
                 -o results/

# Or train your own model first
pandadock gnn train -d ULVSH/ -o models/ --epochs 100
pandadock hybrid -r protein.pdb -l ligand.sdf \
                 --center 10 20 30 --box 20 20 20 \
                 -m models/best_model.pt \
                 -o results/
```

### Traditional Docking

```bash
# Simple docking with Vina-style scoring
pandadock dock -r protein.pdb -l ligand.sdf \
               --center 10 20 30 --box 20 20 20 \
               -o results/
```

### GNN Prediction Only

```bash
# Predict binding affinity for a pre-docked complex
pandadock gnn predict -m model.pt -p protein.mol2 -l ligand.mol2
```

### Universal Rescorer (NEW)

Rescore poses from ANY docking tool using the GNN:

```bash
# Rescore poses from AutoDock Vina
pandadock gnn rescore -m model.pt -r receptor.pdb -p vina_out.sdf -o ranked.csv

# Rescore poses from pandadock-flex
pandadock gnn rescore -m model.pt -r protein.pdb -p flex_poses.sdf --output-sdf ranked.sdf

# Rescore poses from Glide, GOLD, or any other tool
pandadock gnn rescore -m model.pt -r protein.pdb -p docked_poses.sdf
```

### Compare Against Baselines

```bash
# Benchmark GNN against all baseline methods
pandadock gnn compare -m model.pt -d ULVSH/ -o comparison/
```

---

## Commands

### Core Commands

| Command | Description |
|---------|-------------|
| `pandadock dock` | Traditional docking with Vina-style scoring |
| `pandadock hybrid` | Hybrid docking with GNN rescoring (recommended) |

### GNN Commands

| Command | Description |
|---------|-------------|
| `pandadock gnn download-model` | **Download pre-trained model (~82 MB)** |
| `pandadock gnn train` | Train GNN model on dataset (ULVSH, PDBbind, or combined) |
| `pandadock gnn predict` | Predict binding affinity for a single complex |
| `pandadock gnn rescore` | **Universal rescorer for poses from ANY docking tool** |
| `pandadock gnn benchmark` | Benchmark model performance on test set |
| `pandadock gnn compare` | Compare against baseline scoring methods |

### Specialized Docking

| Command | Description |
|---------|-------------|
| `pandadock-flex` | Flexible/induced-fit docking |
| `pandadock-metal` | Metal coordination docking |
| `pandadock-tethered` | Constrained docking near reference |

### Utility Tools

| Command | Description |
|---------|-------------|
| `pandadock-prepare` | Prepare ligands (add H, generate 3D) |
| `pandadock-gridbox` | Generate grid box configurations |
| `pandadock-report` | Generate analysis reports |

---

## Universal GNN Rescorer

The `pandadock gnn rescore` command allows you to rescore docked poses from **any docking software** using the SE(3)-equivariant GNN:

### Supported Input

- **AutoDock Vina** output (SDF/PDBQT converted to SDF)
- **Glide** poses (SDF)
- **GOLD** poses (SDF)
- **pandadock-flex** flexible docking poses
- **pandadock-metal** metal coordination poses
- **pandadock-tethered** constrained poses
- Any multi-conformer SDF file

### Usage

```bash
pandadock gnn rescore -m model.pt -r receptor.pdb -p poses.sdf [OPTIONS]

Options:
  -m, --model PATH      Trained GNN model checkpoint (required)
  -r, --receptor PATH   Receptor PDB or MOL2 file (required)
  -p, --poses PATH      Multi-conformer SDF with poses (required)
  -o, --output PATH     Output CSV with ranked poses (default: rescored_poses.csv)
  --output-sdf PATH     Output SDF with GNN scores as properties
  --site-radius FLOAT   Binding site extraction radius (default: 10 A)
```

### Example Workflow

```bash
# Step 1: Run docking with your preferred tool
vina --receptor protein.pdbqt --ligand ligand.pdbqt --out poses.sdf

# Step 2: Rescore with PandaDock-GNN
pandadock gnn rescore -m model.pt -r protein.pdb -p poses.sdf \
    -o ranked.csv --output-sdf ranked.sdf

# Output CSV columns:
# pose_name, pose_index, gnn_pKd, gnn_energy, activity_prob, predicted_active, gnn_rank
```

### Output SDF Properties

When using `--output-sdf`, each molecule gets these properties:
- `GNN_pKd` - Predicted pKd/pKi value
- `GNN_Energy` - Predicted binding energy (kcal/mol)
- `GNN_Activity` - Activity probability (0-1)
- `GNN_Rank` - Rank based on GNN score (1 = best)

---

## GNN Architecture

PandaDock-GNN uses an SE(3)-equivariant heterogeneous graph neural network:

```
Input: Protein-Ligand Complex
  |
  +-- MOL2/PDB/SDF Parser --> Atom coordinates, types, charges
  |
  +-- Graph Builder --> HeteroData graph
  |   - Protein nodes (56 features)
  |   - Ligand nodes (56 features)
  |   - Interaction edges (23 features, 5A cutoff)
  |
  +-- EGNN Layers x 6 (SE(3)-equivariant message passing)
  |   - Coordinate updates preserve symmetry
  |   - Edge attention mechanism
  |
  +-- Attention Pooling --> Graph-level representation
  |
  +-- Prediction Heads
      - pKd/pEC50 regression
      - Activity classification (sigmoid)
```

**Node Features (56 dims):**
- Atom type one-hot (10)
- SYBYL atom type (16)
- Partial charge (1)
- Hybridization (4)
- Aromaticity, H-bond donor/acceptor (4)
- Residue type (20, protein only)
- Backbone flag (1)

**Edge Features (23 dims):**
- Distance (1)
- Gaussian RBF expansion (16)
- Bond type one-hot (4)
- Interaction type flags (2)

---

## Scoring Functions

| Function | Description | Use Case |
|----------|-------------|----------|
| `vina` | AutoDock Vina empirical scoring (default) | General docking |
| `physics_based` | Lennard-Jones + electrostatics | Detailed energy analysis |

---

## Output Files

### Dock Command

```
docking_output/
+-- complex1.pdb, complex2.pdb, ...   # Protein-ligand complexes
+-- pose1.pdb, pose2.pdb, ...         # Ligand poses only
+-- docking_results.json              # Complete results with energies
+-- interaction_analysis.json         # Detailed interactions
+-- binding_affinities.png            # Affinity distribution
```

### Hybrid Command

```
hybrid_output/
+-- hybrid_results.csv                # Rankings with GNN + Vina scores
+-- pose_1_pec50_X.XX.pdb             # Top poses with pEC50 in filename
+-- complex_1.pdb, ...                # Protein-ligand complexes
```

### Rescore Command

```
rescored_poses.csv                    # Ranked poses with GNN scores
ranked.sdf (optional)                 # SDF with GNN properties
```

---

## Training Your Own GNN Model

### Single Dataset Training

```bash
# Train on ULVSH
pandadock gnn train -d ULVSH/ -o models/ --epochs 100

# Train on PDBbind
pandadock gnn train -p PDBbind/ -o models/ --epochs 100
```

### Combined Dataset Training (Recommended)

```bash
# Train on both ULVSH + PDBbind for best generalization
pandadock gnn train -d ULVSH/ -p PDBbind/ -o models/ \
    --epochs 200 \
    --batch-size 32 \
    --hidden-dim 256 \
    --num-layers 6 \
    --balanced  # Balance samples from both datasets
```

### Benchmark on Test Set

```bash
pandadock gnn benchmark -m models/best_model.pt -d ULVSH/ -o results/
```

---

## Examples

See the `examples/` directory:

- `examples/basic_docking/` - Simple docking workflow
- `examples/flexible_docking/` - Induced-fit docking
- `examples/metal_docking/` - Metalloprotein docking

---

## Documentation

Full documentation available at [pandadock.readthedocs.io](https://pandadock.readthedocs.io/):

- [Installation Guide](https://pandadock.readthedocs.io/en/latest/installation.html)
- [GNN Overview](https://pandadock.readthedocs.io/en/latest/gnn/overview.html)
- [Training Guide](https://pandadock.readthedocs.io/en/latest/gnn/training.html)
- [Hybrid Docking](https://pandadock.readthedocs.io/en/latest/gnn/hybrid_docking.html)
- [CLI Reference](https://pandadock.readthedocs.io/en/latest/cli/pandadock.html)

---

## Citation

If you use PandaDock in your research, please cite:

```bibtex
@article{panda2024pandadock,
  title={PandaDock: SE(3)-Equivariant Graph Neural Network Scoring for Molecular Docking},
  author={Panda, Pritam Kumar},
  journal={bioRxiv},
  year={2024},
  note={Manuscript in preparation}
}
```

---

## Contributing

We welcome contributions! Please see [CONTRIBUTING.md](CONTRIBUTING.md) for guidelines.

---

## License

PandaDock is released under the MIT License. See [LICENSE](LICENSE) for details.

---

## Contact

**Author**: Pritam Kumar Panda
**Affiliation**: Stanford University
**Email**: pritampanda@stanford.edu
**GitHub**: [@pritampanda15](https://github.com/pritampanda15)

---

## Acknowledgments

PandaDock builds upon excellent open-source projects:
- AutoDock Vina (scoring function inspiration)
- PyTorch and PyTorch Geometric (GNN framework)
- RDKit (molecular handling)
- E(n)-Equivariant GNN (Satorras et al. 2021)

---

<div align="center">

**Star this repository if you find it useful!**

[Report Bug](https://github.com/pritampanda15/PandaDock/issues) | [Request Feature](https://github.com/pritampanda15/PandaDock/issues)

</div>
