Metadata-Version: 2.4
Name: mhcflurry
Version: 2.3.0rc1
Summary: MHC Binding Predictor
Home-page: https://github.com/openvax/mhcflurry
Author: Tim O'Donnell and Alex Rubinsteyn
Author-email: timodonnell@gmail.com
License: Apache-2.0
Classifier: Development Status :: 5 - Production/Stable
Classifier: Environment :: Console
Classifier: Operating System :: OS Independent
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: pandas>=2.0
Requires-Dist: appdirs
Requires-Dist: ahocorasick-rs
Requires-Dist: scikit-learn
Requires-Dist: mhcgnomes>=3.0.1
Requires-Dist: numpy>=1.22.4
Requires-Dist: pyyaml
Requires-Dist: tqdm
Requires-Dist: torch>=2.0.0
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: license
Dynamic: license-file
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

[![Build Status](https://github.com/openvax/mhcflurry/actions/workflows/ci.yml/badge.svg)](https://github.com/openvax/mhcflurry/actions/workflows/ci.yml)
[![Coverage Status](https://coveralls.io/repos/github/openvax/mhcflurry/badge.svg?branch=master)](https://coveralls.io/github/openvax/mhcflurry?branch=master)
[![Open In Colab](https://colab.research.google.com/assets/colab-badge.svg)](https://colab.research.google.com/github/openvax/mhcflurry/blob/master/notebooks/mhcflurry-colab.ipynb)

# mhcflurry
[MHC I](https://en.wikipedia.org/wiki/MHC_class_I) ligand
prediction package with competitive accuracy and a fast and
[documented](http://openvax.github.io/mhcflurry/) implementation.

> [!IMPORTANT]
> **Version 2.3.0** keeps the same external API as 2.2.0 and ships substantial
> performance and tooling improvements for users training their own models or
> running large prediction workloads:
> - **Device-resident affinity training**: `Class1NeuralNetwork.fit()` keeps
>   peptides, alleles, targets, and the random-negative pool on the active
>   torch device for the lifetime of one fit, eliminating per-batch
>   host↔device copies.
> - **Multi-GPU prediction by default**: `mhcflurry-predict`,
>   `mhcflurry-predict-scan`, `mhcflurry-calibrate-percentile-ranks`, and the
>   sweep eval script auto-discover visible GPUs and fan out across them.
> - **Orchestrator auto-tuning**: `mhcflurry-class1-train-pan-allele-models`
>   resolves `--num-jobs`, `--max-workers-per-gpu`, `--dataloader-num-workers`,
>   and `random_negative_pool_epochs` from the box's hardware so the same
>   recipe runs on a workstation, single-GPU node, or 8×A100 host.
>   `--dataloader-num-workers` applies to streaming pretraining; affinity
>   fine-tuning batches from device-resident tensors.
> - **`torch.compile` + TF32 + matmul-precision** are first-class CLI flags
>   on the train commands; the in-process Inductor cache is warmed by a single
>   worker before the production pool launches.
>
> If you are upgrading from 2.1.x or 2.2.x, simply
> `pip install --upgrade mhcflurry`. The published pre-trained models are
> unchanged and will be loaded automatically. Internal refactors (per-fit
> device-resident training tensors, torch-side peptide encodings) do not
> affect the public Python or CLI surface.
>
> Earlier release: **Version 2.2.0** was the first release to use PyTorch as
> its neural network backend, replacing TensorFlow/Keras. It introduced the
> Python 3.10+ and `pandas >= 2.0` requirements and added Apple Silicon (MPS)
> support.

MHCflurry implements class I peptide/MHC binding affinity prediction.
The current version provides pan-MHC I predictors supporting any MHC
allele of known sequence. MHCflurry runs on Python 3.10+ using the
[PyTorch](https://pytorch.org/) neural network library.
It exposes [command-line](http://openvax.github.io/mhcflurry/commandline_tutorial.html)
and [Python library](http://openvax.github.io/mhcflurry/python_tutorial.html)
interfaces.

MHCflurry also includes two experimental predictors,
an "antigen processing" predictor that attempts to model MHC allele-independent
effects such as proteosomal cleavage and a "presentation" predictor that
integrates processing predictions with binding affinity predictions to give a
composite "presentation score." Both models are trained on mass spec-identified
MHC ligands.

If you find MHCflurry useful in your research please cite:

> T. O'Donnell, A. Rubinsteyn, U. Laserson. "MHCflurry 2.0: Improved pan-allele prediction of MHC I-presented peptides by incorporating antigen processing," *Cell Systems*, 2020. https://doi.org/10.1016/j.cels.2020.06.010

> T. O'Donnell, A. Rubinsteyn, M. Bonsack, A. B. Riemer, U. Laserson, and J. Hammerbacher, "MHCflurry: Open-Source Class I MHC Binding Affinity Prediction," *Cell Systems*, 2018. https://doi.org/10.1016/j.cels.2018.05.014

Please file an issue if you have questions or encounter problems.

Have a bugfix or other contribution? We would love your help. See our [contributing guidelines](CONTRIBUTING.md).

## Try it now

You can generate MHCflurry predictions without any setup by running our Google colaboratory [notebook](https://colab.research.google.com/github/openvax/mhcflurry/blob/master/notebooks/mhcflurry-colab.ipynb).

## Installation (pip)

Install the package:

```
$ pip install mhcflurry
```

Download our datasets and trained models:

```
$ mhcflurry-downloads fetch
```

You can now generate predictions:

```
$ mhcflurry-predict \
       --alleles HLA-A0201 HLA-A0301 \
       --peptides SIINFEKL SIINFEKD SIINFEKQ \
       --out /tmp/predictions.csv

Wrote: /tmp/predictions.csv
```

Or scan protein sequences for potential epitopes:

```
$ mhcflurry-predict-scan \
        --sequences MFVFLVLLPLVSSQCVNLTTRTQLPPAYTNSFTRGVYYPDKVFRSSVLHS \
        --alleles HLA-A*02:01 \
        --out /tmp/predictions.csv

Wrote: /tmp/predictions.csv
```

### Unified `mhcflurry` parent command

Starting in 2.3.0 there is also a single `mhcflurry` command that dispatches
to every subcommand:

```
$ mhcflurry predict \
        --alleles HLA-A0201 HLA-A0301 \
        --peptides SIINFEKL SIINFEKD SIINFEKQ \
        --out /tmp/predictions.csv

$ mhcflurry compare-models \
        --a results/new_run/ \
        --b public \
        --out results/comparison/

$ mhcflurry plot-model-comparison --input results/comparison/
```

Every historical command is reachable as a subcommand
(`mhcflurry-predict` ↔ `mhcflurry predict`, `mhcflurry-downloads` ↔
`mhcflurry downloads`, `mhcflurry-class1-train-pan-allele-models` ↔
`mhcflurry class1-train-pan-allele-models`, etc.). Both forms run the
same underlying entry point; the legacy `mhcflurry-*` scripts remain
installed as compat shims and are not changing. `mhcflurry --help`
lists every available subcommand.

The two new-in-2.3.0 model-comparison tools, `compare-models` and
`plot-model-comparison`, only have the unified form.

See the [documentation](http://openvax.github.io/mhcflurry/) for more details.

## Development and tests

From a checkout, source `develop.sh` to create and activate the editable
environment:

```
$ source develop.sh
```

For quick feedback, run lint plus a focused unit subset:

```
$ ./lint.sh
$ pytest -q test/test_amino_acid.py test/test_random_negative_peptides.py
```

`pytest test/` is the full test suite, not a fast unit-only loop. It includes
small end-to-end training runs, command subprocess tests, public-model smoke
tests that require cached MHCflurry download bundles, and speed/regression
checks, so it can take many minutes. Use
`pytest -q test -m "not slow and not downloads"` for the broad fast tier, and
`pytest -q test --durations=25` when auditing slow tests. See the
[testing documentation](http://openvax.github.io/mhcflurry/testing.html) for
the current test tiers.

## Docker
You can also try the latest (GitHub master) version of MHCflurry using the Docker
image hosted on [Dockerhub](https://hub.docker.com/r/openvax/mhcflurry) by
running:

```
$ docker run -p 9999:9999 --rm openvax/mhcflurry:latest
```

This will start a [jupyter](https://jupyter.org/) notebook server in an
environment that has MHCflurry installed. Go to `http://localhost:9999` in a
browser to use it.

To build the Docker image yourself, from a checkout run:

```
$ docker build -t mhcflurry:latest .
$ docker run -p 9999:9999 --rm mhcflurry:latest
```
## Predicted sequence motifs
Sequence logos for the binding motifs learned by MHCflurry BA are available [here](https://openvax.github.io/mhcflurry-motifs/).

## Common issues and fixes

### Problems downloading data and models
Some users have reported HTTP connection issues when using `mhcflurry-downloads fetch`. As a workaround, you can download the data manually (e.g. using `wget`) and then use `mhcflurry-downloads` just to copy the data to the right place.

To do this, first get the URL(s) of the downloads you need using `mhcflurry-downloads url`:

```
$ mhcflurry-downloads url models_class1_presentation
https://github.com/openvax/mhcflurry/releases/download/1.6.0/models_class1_presentation.20200205.tar.bz2```
```

Then make a directory and download the needed files to this directory:

```
$ mkdir downloads
$ wget  --directory-prefix downloads https://github.com/openvax/mhcflurry/releases/download/1.6.0/models_class1_presentation.20200205.tar.bz2```

HTTP request sent, awaiting response... 200 OK
Length: 72616448 (69M) [application/octet-stream]
Saving to: 'downloads/models_class1_presentation.20200205.tar.bz2'
```

Now call `mhcflurry-downloads fetch` with the `--already-downloaded-dir` option to indicate that the downloads should be retrived from the specified directory:

```
$ mhcflurry-downloads fetch models_class1_presentation --already-downloaded-dir downloads
```
