Metadata-Version: 2.4
Name: graphrelax
Version: 0.1.1.dev1
Summary: Combine LigandMPNN sequence design with AMBER relaxation
Author: GraphRelax Authors
License: MIT
Project-URL: Homepage, https://github.com/delalamo/GraphRelax
Project-URL: Repository, https://github.com/delalamo/GraphRelax
Project-URL: Issues, https://github.com/delalamo/GraphRelax/issues
Keywords: protein,design,relaxation,LigandMPNN,OpenMM,molecular-dynamics,protein-design,bioinformatics
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: torch>=2.0
Requires-Dist: numpy<2
Requires-Dist: prody
Requires-Dist: biopython
Requires-Dist: absl-py
Requires-Dist: ml-collections
Requires-Dist: dm-tree
Requires-Dist: openmm
Provides-Extra: cpu
Requires-Dist: openmm[cpu]; extra == "cpu"
Provides-Extra: cuda11
Requires-Dist: openmm[cuda11]; extra == "cuda11"
Provides-Extra: cuda12
Requires-Dist: openmm[cuda12]; extra == "cuda12"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Provides-Extra: test
Requires-Dist: pytest>=7.0; extra == "test"
Requires-Dist: pytest-cov>=4.1; extra == "test"
Dynamic: license-file

# GraphRelax

A drop-in replacement for Rosetta Relax that replaces force field-guided residue repacking and design with equivalent functions from graph neural networks.

GraphRelax combines **LigandMPNN** (for sequence design and side-chain packing) with **OpenMM AMBER minimization** to reproduce Rosetta FastRelax and Design protocols.

## Installation

```bash
# Clone the repository
git clone https://github.com/your-username/GraphRelax.git
cd GraphRelax

# Install GraphRelax and dependencies
pip install -e .

# Download LigandMPNN model weights (~40MB)
./scripts/download_weights.sh
```

### Optional: Constrained Minimization

If you want to use `--constrained-minimization` mode (AlphaFold-style relaxation with position restraints and violation checking), you also need pdbfixer:

```bash
# pdbfixer is only available via conda-forge, not PyPI
conda install -c conda-forge pdbfixer
```

### Platform-specific Installation

```bash
# CPU-only (smaller install, no GPU dependencies)
pip install -e ".[cpu]"

# With CUDA 11 GPU support
pip install -e ".[cuda11]"

# With CUDA 12 GPU support
pip install -e ".[cuda12]"
```

### Dependencies

Core dependencies (installed automatically via pip):

- Python >= 3.9
- PyTorch >= 2.0
- NumPy < 2.0 (PyTorch <2.5 is incompatible with NumPy 2.x)
- OpenMM
- BioPython
- ProDy
- dm-tree
- absl-py
- ml-collections

Optional (for `--constrained-minimization` only):

- pdbfixer (conda-forge only, not on PyPI)

## Features

- **FastRelax-like protocol**: Alternate between side-chain repacking and energy minimization
- **Sequence design**: Full redesign or residue-specific control via Rosetta-style resfiles
- **Multiple output modes**: Relax-only, repack-only, design-only, or combinations
- **GPU acceleration**: Automatic GPU detection for both LigandMPNN and OpenMM
- **Scorefile output**: Rosetta-compatible scorefiles with energy terms and sequence metrics

## Usage

### Basic Commands

```bash
# Default: repack + minimize for 5 cycles
graphrelax -i input.pdb -o relaxed.pdb

# Repack + minimize with 10 cycles
graphrelax -i input.pdb -o relaxed.pdb --n-iter 10

# Only minimize (no repacking)
graphrelax -i input.pdb -o minimized.pdb --no-repack

# Only repack side chains (no minimization)
graphrelax -i input.pdb -o repacked.pdb --repack-only

# Full redesign + minimize
graphrelax -i input.pdb -o designed.pdb --design

# Design with resfile specification
graphrelax -i input.pdb -o designed.pdb --design --resfile design.resfile

# Generate 10 different designs
graphrelax -i input.pdb -o designed.pdb --design -n 10

# Design only (no minimization) - fast sampling
graphrelax -i input.pdb -o designed.pdb --design-only -n 100

# With scorefile output
graphrelax -i input.pdb -o relaxed.pdb --scorefile scores.sc

# Design with ligand context (requires --constrained-minimization for ligands)
graphrelax -i complex.pdb -o designed.pdb --design --model-type ligand_mpnn --constrained-minimization
```

### Operating Modes

| Flag                | Repack | Design | Minimize |
| ------------------- | ------ | ------ | -------- |
| `--relax` (default) | Yes    | No     | Yes      |
| `--repack-only`     | Yes    | No     | No       |
| `--no-repack`       | No     | No     | Yes      |
| `--design`          | No     | Yes    | Yes      |
| `--design-only`     | No     | Yes    | No       |

### Minimization Modes

By default, GraphRelax uses **unconstrained minimization** - a simple, bare-bones OpenMM energy minimization with no position restraints and default tolerance parameters. This is fast and works well for most use cases.

For more controlled minimization (AlphaFold-style), use `--constrained-minimization`:

```bash
# Default: unconstrained minimization (fast, no restraints)
graphrelax -i input.pdb -o relaxed.pdb --no-repack

# Constrained minimization with position restraints and violation checking
# Note: requires pdbfixer (conda install -c conda-forge pdbfixer)
graphrelax -i input.pdb -o relaxed.pdb --no-repack --constrained-minimization

# Constrained with custom restraint stiffness
graphrelax -i input.pdb -o relaxed.pdb --constrained-minimization --stiffness 5.0
```

| Mode                         | Position Restraints | Violation Checking | Speed  | Requires pdbfixer |
| ---------------------------- | ------------------- | ------------------ | ------ | ----------------- |
| Default (unconstrained)      | No                  | No                 | Fast   | No                |
| `--constrained-minimization` | Yes (harmonic)      | Yes                | Slower | Yes               |

**Important:** When your input PDB contains ligands or other non-standard residues (HETATM records other than water), you **must** use `--constrained-minimization`. The unconstrained mode uses AMBER force field parameters which don't include templates for non-standard molecules. Constrained mode uses OpenFold's AmberRelaxation which can handle ligands properly.

### Working with Ligands

When designing proteins with bound ligands (e.g., heme, cofactors, small molecules), use `ligand_mpnn` model type with constrained minimization:

```bash
# Design around a ligand
graphrelax -i protein_with_ligand.pdb -o designed.pdb \
    --design --model-type ligand_mpnn --constrained-minimization

# Repack side chains around a ligand
graphrelax -i protein_with_ligand.pdb -o repacked.pdb \
    --relax --model-type ligand_mpnn --constrained-minimization
```

**Note:** If you attempt to use unconstrained minimization with a PDB containing ligands, GraphRelax will exit with an error message directing you to use `--constrained-minimization`.

### Resfile Format

GraphRelax supports Rosetta-style resfiles for residue-specific control:

```
# Default behavior for all residues
NATAA
START
# Design positions 10-15 on chain A
10 A ALLAA
11 A ALLAA
12 A ALLAA
13 A ALLAA
14 A ALLAA
15 A ALLAA
# Position 20: only allow hydrophobics
20 A PIKAA AVILMFYW
# Position 25: exclude cysteine and proline
25 A NOTAA CP
# Position 30: only polar residues
30 A POLAR
# Keep position 40 completely fixed
40 A NATRO
```

#### Supported Commands

| Command     | Description                                      |
| ----------- | ------------------------------------------------ |
| `NATRO`     | Fixed completely (no design, no repacking)       |
| `NATAA`     | Repack only (same amino acid, optimize rotamers) |
| `ALLAA`     | Design with all 20 amino acids                   |
| `PIKAA XYZ` | Design with only specified amino acids           |
| `NOTAA XYZ` | Design excluding specified amino acids           |
| `POLAR`     | Design with polar residues only (DEHKNQRST)      |
| `APOLAR`    | Design with nonpolar residues only (ACFGILMPVWY) |

### Command-Line Options

```
Required:
  -i, --input PDB       Input PDB file
  -o, --output PDB      Output PDB file (or prefix if -n > 1)

Mode selection:
  --relax               Repack + minimize cycles (default)
  --repack-only         Only repack side chains
  --no-repack           Only minimize
  --design              Design + minimize
  --design-only         Only design

Iteration and output:
  --n-iter N            Number of cycles (default: 5)
  -n, --n-outputs N     Number of outputs to generate (default: 1)

Design options:
  --resfile FILE        Rosetta-style resfile
  --temperature T       LigandMPNN sampling temperature (default: 0.1)
  --model-type TYPE     protein_mpnn, ligand_mpnn, or soluble_mpnn

Relaxation options:
  --constrained-minimization  Use constrained minimization with position
                              restraints (AlphaFold-style). Default is
                              unconstrained. Requires pdbfixer.
                              **Required when input PDB contains ligands.**
  --stiffness K         Restraint stiffness in kcal/mol/A^2 (default: 10.0)
                        Only applies to constrained minimization.
  --max-iterations N    Max L-BFGS iterations, 0=unlimited (default: 0)

Input preprocessing:
  --keep-waters         Keep water molecules in input (default: removed)

Scoring:
  --scorefile FILE      Output scorefile with energy terms

General:
  -v, --verbose         Verbose output
  --seed N              Random seed for reproducibility
```

### Scorefile Output

When `--scorefile` is specified, outputs a Rosetta-style scorefile:

```
SCORE:  total_score  openmm_energy  bond_energy  angle_energy  dihedral_energy  nonbonded_energy  ligandmpnn_score  seq_recovery  description
SCORE:     -234.56       -234.56        12.3         45.6             23.1              -315.6             0.847          0.92   output_1.pdb
SCORE:     -228.12       -228.12        11.8         44.2             22.8              -307.0             0.823          0.89   output_2.pdb
```

## Python API

```python
from graphrelax import Pipeline, PipelineConfig, PipelineMode
from graphrelax.config import DesignConfig, RelaxConfig
from pathlib import Path

# Configure pipeline
config = PipelineConfig(
    mode=PipelineMode.DESIGN,
    n_iterations=5,
    n_outputs=10,
    design=DesignConfig(
        model_type="ligand_mpnn",
        temperature=0.1,
    ),
    relax=RelaxConfig(
        stiffness=10.0,
        constrained=False,  # Default: unconstrained minimization
    ),
)

# Run pipeline
pipeline = Pipeline(config)
results = pipeline.run(
    input_pdb=Path("input.pdb"),
    output_pdb=Path("output.pdb"),
    resfile=Path("design.resfile"),  # optional
)

# Access results
for output in results["outputs"]:
    print(f"Output: {output['output_path']}")
    print(f"Sequence: {output['sequence']}")
    print(f"Final energy: {output.get('final_energy', 'N/A')}")
```

## How It Works

GraphRelax implements an alternating optimization protocol similar to Rosetta FastRelax:

1. **Parse Input**: Read PDB structure and optional resfile
2. **For each iteration**:
   - **Design/Repack Phase**: Use LigandMPNN to generate sequences or repack side chains
   - **Minimize Phase**: Use OpenMM with AMBER force field for energy minimization
3. **Output**: Write final structure(s) and optional scorefile

### Key Differences from Rosetta

| Aspect            | Rosetta                      | GraphRelax                     |
| ----------------- | ---------------------------- | ------------------------------ |
| Sequence sampling | Monte Carlo with force field | LigandMPNN neural network      |
| Rotamer packing   | Discrete rotamer library     | LigandMPNN continuous sampling |
| Energy function   | Rosetta energy function      | AMBER force field              |
| Speed             | Slower                       | Faster (GPU acceleration)      |

## License

MIT License

## Citation

If you use GraphRelax in your research, please cite:

- LigandMPNN: Dauparas et al. (2023)
- OpenMM: Eastman et al. (2017)
- AlphaFold relaxation protocol: Jumper et al. (2021)
