Metadata-Version: 2.4
Name: proteinmpnn-cli
Version: 0.1.0
Summary: Structure-conditioned protein sequence design using message passing neural networks
Project-URL: Homepage, https://github.com/miguelgondu/ProteinMPNN
Project-URL: Repository, https://github.com/miguelgondu/ProteinMPNN
Project-URL: Issues, https://github.com/miguelgondu/ProteinMPNN/issues
Project-URL: Changelog, https://github.com/miguelgondu/ProteinMPNN/blob/main/CHANGELOG.md
Author-email: Miguel González Duque <miguelgondu@gmail.com>
License: MIT License
        
        Copyright (c) 2022 Justas Dauparas
        Copyright (c) 2025 Miguel González Duque
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: bioinformatics,deep-learning,protein,sequence-design,structural-biology
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Typing :: Typed
Requires-Python: >=3.13
Requires-Dist: biopython>=1.87
Requires-Dist: numpy>=2.4.6
Requires-Dist: pydantic>=2.13.4
Requires-Dist: rich>=14.0.0
Requires-Dist: torch>=2.12.0
Requires-Dist: typer>=0.26.3
Description-Content-Type: text/markdown

# `proteinmpnn` - A `cli` adaptation of Kuhlman lab's fork

> [!WARNING]
> This is a work-in-progress.

This repo contains a clean-up of Kuhlman lab's fork of `ProteinMPNN`, converting
it into an easy-to-use `cli`.

This modernization includes
- using `uv` for dependency and package management.
- using `typer` to construct a CLI with plenty of flavor.

Clone this repo and run
```
uv run proteinmpnn --help
```

## Current features
- **Running inference for a single `pdb`** using `proteinmpnn run-single`. Use `--help` to get a look into the optional arguments. This is meant to replace the single-protein analyses.
- **Computing conditional/unconditional probabilities** of amino-acids per location. Check `proteinmpnn compute-probs --help` for more context.

## Other improvements on Kuhlman's fork
- The usual two-step sequence with `generate_json.py` and then running it is no longer necessary.
- **Unit testing** using `pytest`, as well as backwards compatibility test (making sure that we don't deviate from the original behavior).
- **Linting** using `ruff` to make the code more developer-friendly.

# Original readme

This repo includes the Kuhlman Lab fork of ProteinMPNN. It includes all the functionality of the original ProteinMPNN repo (linked [here](https://github.com/dauparas/ProteinMPNN)), with the following additions:
- Improved input parsing for custom design runs
- Multi-state design support
- Additional utilities to provide integration with [EvoPro](https://github.com/Kuhlman-Lab/evopro)

![ProteinMPNN](https://docs.google.com/drawings/d/e/2PACX-1vTtnMBDOq8TpHIctUfGN8Vl32x5ISNcPKlxjcQJF2q70PlaH2uFlj2Ac4s3khnZqG1YxppdMr0iTyk-/pub?w=889&h=358)
Read [ProteinMPNN paper](https://www.biorxiv.org/content/10.1101/2022.06.03.494563v1).

## Installation:

```
git clone git@github.com:Kuhlman-Lab/proteinmpnn.git
cd proteinmpnn
mamba create env -f setup/proteinmpnn.yml
```

### NOTE (July 2025):

ProteinMPNN uses CUDA 11.3, which is too old for the new H100 GPUs (CUDA 11.8+). This means it may hang if run from the default ```mpnn``` environment.

To fix this, we can generate a CUDA 12.4 environment as follows:
```
# Install original env without torch/cuda dependencies
mamba env create -f setup/proteinmpnn_cu12.4.yml -n mpnn_cu12.4

# Install torch/cuda 12.4 dependencies
pip install torch==2.4.0 torchvision==0.19.0 torchaudio==2.4.0 --index-url https://download.pytorch.org/whl/cu124
```

To use this, simply replace ```conda activate mpnn``` with ```conda activate mpnn_cu2.4``` wherever present.

## Usage Guidelines:

### General Usage

The different input arguments available for each script can be viewed by adding `-h` to your python call (e.g., `python generate_json.py -h`).

ProteinMPNN accepts PDB files as input and produces FASTA files as output.

Unlike the original repo, our ProteinMPNN organizes the different input options (aka arguments) into `.flag` files:
- `json.flags` is used to specify design constraints, like fixed residues and symmetry
- `proteinmpnn.flags` is used to specify prediction flags, like which sampling temperature and model variant to use.

In general, there are two steps to running ProteinMPNN:
1. Run the `generate_json.py` script and pass it the `json.flags` file.
- This makes a new file called `proteinmpnn_res_specs.json` containing parsed design information.
2. Run the `run_protein_mpnn.py ` script and pass it `proteinmpnn.flags` and `proteinmpnn_res_specs.json` to obtain the actual ProteinMPNN prediction.

### Useful Flags

Used in `json.flags`:

`--default_design_setting`: this is an optional filter to allow/disallow certain residue types during design. By default, it is set to `all`, which allows all 20 amino acids. Possible settings include:
    `all-hydphob`: exclude hydrophobic residues (`CDEHKNPQRSTX`)
    `all-hydphil`: exclude hydrophilic residues (`ACFGILMPVWYX`)
    `all-CLD`: exclude specific amino acids (in this case, Cys, Leu, and Asp)
    `L+polar`: mix-and-match amino acids and categories (in this case, allow all polar amino acids and also Leu)

Used in `proteinmpnn.flags`:
`--model_name`: specifies which ProteinMPNN model checkpoint to use. Possible options include:
    `v_48_002`: vanilla (default) model with k=48 neighbors and 0.02A noise
    `s_48_010`: soluble protein model with k=48 neighbors and 0.1A noise

`--sampling_temp`: specifies the sampling temperature, which changes how diverse the generated sequences will be. Ranges from 0 to 1, inclusive. A temperature of 0 returns the "best" prediction every time (zero diversity), while a temperature of 1 will return completely random samples. Recommended range is 0.0 - 0.3 or so.

`--dump_probs`: if included, ProteinMPNN will save the predicted sequence probability table for each scaffold. This will be a numpy array of shape [L, 21], for a protein of length L. If multiple sequences are generated per scaffold, probabilities will be averaged before saving. A helper script for visualizing these tables is included at `run/helper_scripts/other_tools/view_probs.py`.

### Example Cases

Example input and expected output files, as well as jobscripts and flag files, for many different design tasks are included in `examples/`. For a summary and explanation of each example, see `examples/EXAMPLES.md`. Currently supported protocols include:

1. Monomer Design (with user-friendly parsing of designable residues)
2. Binder Design
2. Oligomer Design (with support for abitrary symmetries in homooligomers)
3. Multi-state Design (with support for multiple complex design constraints)

-----------------------------------------------------------------------------------------------------

## Unit Testing

TODO

## Code organization:
* `run/run_protein_mpnn.py` - the main script to initialialize and run the model.
* `run/generate_json.py` - function to automatically generate json of design constraints.
* `run/helper_scripts/` - helper functions to parse PDBs, assign which chains to design, which residues to fix, adding AA bias, tying residues etc.
* `examples/` - simple example inputs/outputs and runscripts for different tasks.
* `model_weights/` - trained proteinmpnn model weights.
    * `v_48_...` - vanilla proteinmpnn models trained at different noise levels.
    * `s_48_...` - solublempnn models trained at different noise levels.
    * `ca_48_...` - Ca-only models trained at different noise levels.


## License

ProteinMPNN is distributed under an MIT license, which can be found at `proteinmpnn/LICENSE`. See license file for more details.
