Metadata-Version: 2.4
Name: peptcrnet
Version: 1.2.0
Summary: A Deep Learning Framework for TCR-Peptide Recognition Prediction
Home-page: https://github.com/mlizhangx/Pep-TCRNet
Author: PepTCRNet Team
Author-email: mlizhang@gmail.com
Project-URL: Bug Reports, https://github.com/mlizhangx/Pep-TCRNet/issues
Project-URL: Source, https://github.com/mlizhangx/Pep-TCRNet
Project-URL: Documentation, https://mlizhangx.github.io/Pep-TCRNet/
Project-URL: Download, https://doi.org/10.5281/zenodo.14194846
Keywords: TCR peptide recognition deep-learning bioinformatics immunology
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.8,<3.13
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy<2.0.0,>=1.24.0
Requires-Dist: pandas<3.0.0,>=2.0.0
Requires-Dist: scikit-learn<2.0.0,>=1.3.0
Requires-Dist: scipy<2.0.0,>=1.10.0
Requires-Dist: tensorflow<3.0.0,>=2.13.0
Requires-Dist: tf-keras>=2.13.0
Requires-Dist: tensorflow-probability[tf]<1.0.0,>=0.21.0
Requires-Dist: matplotlib<4.0.0,>=3.7.0
Requires-Dist: seaborn<1.0.0,>=0.12.0
Requires-Dist: networkx<4.0.0,>=2.8.0
Requires-Dist: stellargraph>=1.2.0
Requires-Dist: python-Levenshtein>=0.21.0
Requires-Dist: umap-learn>=0.5.0
Requires-Dist: hdbscan>=0.8.0
Requires-Dist: tqdm>=4.65.0
Requires-Dist: natsort>=8.0.0
Requires-Dist: joblib>=1.3.0
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: pytest-cov>=2.0; extra == "dev"
Requires-Dist: black>=21.0; extra == "dev"
Requires-Dist: flake8>=3.8; extra == "dev"
Requires-Dist: sphinx>=4.0; extra == "dev"
Requires-Dist: sphinx-rtd-theme>=0.5; extra == "dev"
Provides-Extra: viz
Requires-Dist: seaborn>=0.12.0; extra == "viz"
Requires-Dist: matplotlib>=3.7.0; extra == "viz"
Requires-Dist: plotly>=5.0; extra == "viz"
Provides-Extra: gpu
Requires-Dist: tensorflow-gpu>=2.13.0; extra == "gpu"
Provides-Extra: notebooks
Requires-Dist: jupyter>=1.0.0; extra == "notebooks"
Requires-Dist: ipywidgets>=8.0.0; extra == "notebooks"
Requires-Dist: notebook>=6.5.0; extra == "notebooks"
Provides-Extra: docs
Requires-Dist: mkdocs>=1.5.0; extra == "docs"
Requires-Dist: mkdocs-material>=9.0.0; extra == "docs"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license-file
Dynamic: project-url
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# PepTCR-Net: Deep Learning for TCR-Peptide Recognition Prediction

[![PyPI version](https://img.shields.io/pypi/v/peptcrnet.svg)](https://pypi.org/project/peptcrnet/)
[![Python 3.8–3.12](https://img.shields.io/badge/python-3.8--3.12-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Checkpoints on Zenodo](https://zenodo.org/badge/DOI/10.5281/zenodo.14194846.svg)](https://doi.org/10.5281/zenodo.14194846)

**PepTCR-Net** predicts T-cell receptor (TCR) recognition of peptide antigens using deep learning with uncertainty quantification.

## Quick Start

```bash
pip install -U peptcrnet
peptcrnet-download-models
peptcrnet-demo
```

For notebooks: `pip install peptcrnet[notebooks]`

**Requirements:** Python 3.8–3.12 (not 3.13).

## Installation

### From PyPI (recommended)

```bash
pip install -U peptcrnet
peptcrnet-download-models   # downloads ~283 MB from Zenodo to ~/.peptcrnet/
```

### From source

```bash
git clone https://github.com/mlizhangx/Pep-TCRNet.git
cd Pep-TCRNet
pip install -e ".[notebooks]"
peptcrnet-download-models
```

## Pretrained model checkpoints (required for prediction)

Checkpoints are **not** included in the pip package. Download them once from Zenodo:

- **Checkpoints (v2):** [https://doi.org/10.5281/zenodo.14194846](https://doi.org/10.5281/zenodo.14194846)
- **Training CSV data (v1):** [https://doi.org/10.5281/zenodo.14194728](https://doi.org/10.5281/zenodo.14194728)
- **Paper:** [https://doi.org/10.1093/bib/bbaf351](https://doi.org/10.1093/bib/bbaf351)

**Automatic (recommended):**

```bash
peptcrnet-download-models
```

**Manual:**

```bash
curl -LO "https://zenodo.org/records/14194846/files/peptcrnet-pretrained-checkpoints-v1.zip?download=1"
unzip peptcrnet-pretrained-checkpoints-v1.zip -d ~/.peptcrnet/
```

Files are cached under `~/.peptcrnet/checkpoints/` and `~/.peptcrnet/datasets/atchley.txt`.

If a previous download failed, delete the bad file first:

```bash
rm -f ~/.peptcrnet/peptcrnet-pretrained-checkpoints-v1.zip
rm -rf ~/.peptcrnet/checkpoints ~/.peptcrnet/datasets
```

## Basic Usage

### How prediction works (important)

The pretrained model is a **fixed multi-class classifier**: it assigns each TCR to
the single best-matching peptide from a small set of known peptides (the most
frequent in-distribution peptides). It does **not** score arbitrary peptides — to
predict peptides outside this set you must retrain (see *Train on your own peptides*).

The default vocabulary is **top-5**:

```
YVLDHLIVV, NLVPMVATV, GILGFVFTL, GLCTLVAML, KLGGALQAK
```

Use `top_k=10` (or 15/20) for larger vocabularies (requires the matching
`top-10_case-16.h5` checkpoint).

### One-line prediction

```python
from peptcrnet import quick_predict

results = quick_predict(
    tcr_sequences=["CASSLAPGATNEKLFF", "CASSLKPSYNEQFF"],
    mhc_alleles=["HLA-A*02:01", "HLA-A*02:01"],
    v_genes=["TRBV19", "TRBV7-9"],
    j_genes=["TRBJ1-4", "TRBJ2-3"],
    scenario=16,
)
print(results)  # predicted_peptide column now shows real sequences, e.g. NLVPMVATV
```

### Predict against your own list of peptides

Restrict the prediction to peptides of interest (must be within the model's
vocabulary). Each TCR is assigned its best match from your list:

```python
results = quick_predict(
    tcr_sequences=["CASSLAPGATNEKLFF", "CASSLKPSYNEQFF"],
    mhc_alleles=["HLA-A*02:01", "HLA-A*02:01"],
    v_genes=["TRBV19", "TRBV7-9"],
    j_genes=["TRBJ1-4", "TRBJ2-3"],
    scenario=16,
    candidate_peptides=["GILGFVFTL", "NLVPMVATV", "KLGGALQAK"],
)
```

### Predict from CSV

```python
from peptcrnet import predict_from_file

results = predict_from_file("my_data.csv", scenario=16)
results.to_csv("predictions.csv", index=False)
```

See [USAGE_EXAMPLES.md](USAGE_EXAMPLES.md) and [documentation](https://mlizhangx.github.io/Pep-TCRNet/) for more.

### Train on your own peptides (custom model)

To predict peptides outside the built-in vocabulary, train a new classifier on
your own labeled data (columns: `CDR3`, `Peptide`, and `MHC`/`V`/`J` for HLA/VJ
scenarios):

```python
import pandas as pd
from peptcrnet import PepTCRNetTrainer

train_df = pd.read_csv("my_training_data.csv")
trainer = PepTCRNetTrainer(scenario=16)
trainer.fit(
    train_df,
    num_peptides=None,           # None = use all peptides in your data
    epochs=100,
    output_checkpoint="my_model.h5",
    output_labels="my_labels.json",
)

# Predict with your trained model
predictor = trainer.to_predictor()
results = predictor.predict_with_uncertainty(test_df)

# ...or load it later
from peptcrnet import PepTCRNetPredictor
predictor = PepTCRNetPredictor(
    scenario=16, checkpoint="my_model.h5", labels="my_labels.json",
)
```

## Data Format

Input CSV for `predict_from_file` and the predictor API:

| Column | Required | Description | Example |
|--------|----------|-------------|---------|
| `CDR3` | Yes | TCR CDR3β sequence | `CASSRGQGNEQFF` |
| `MHC` | Scenario-dependent | HLA allele (single column) | `HLA-A*02:01` |
| `V` | Scenario-dependent | V gene segment | `TRBV7-2` |
| `J` | Scenario-dependent | J gene segment | `TRBJ2-1` |
| `Peptide` | Optional | True peptide (evaluation only) | `GILGFVFTL` |

**Note:** The prediction API uses a single **`MHC`** column. Some training notebooks split HLA into `HLA-A`, `HLA-B`, `HLA-C`; merge to `MHC` for prediction or use the Zenodo CSV format.

Default scenario **16** uses ED + HLA + VJ features — provide `CDR3`, `MHC`, `V`, and `J`.

## Demo notebook (source install)

```bash
pip install peptcrnet[notebooks]
jupyter notebook DEMO_Complete_Pipeline.ipynb
```

## Citation

```bibtex
@article{le2025peptcrnet,
  title={PepTCR-Net: prediction of multi-class antigen peptides by T-cell receptor sequences with deep learning},
  author={Le, Phi and Ung, Leah and Yang, Hai and Huang, Anwen and He, Tao and Bruno, Peter and Oh, David Y and Keenan, Bridget P and Zhang, Li},
  journal={Briefings in Bioinformatics},
  volume={26},
  number={4},
  pages={bbaf351},
  year={2025},
  doi={10.1093/bib/bbaf351}
}
```

## License

MIT — see [LICENSE](LICENSE).

## Contact

- [GitHub Issues](https://github.com/mlizhangx/Pep-TCRNet/issues)
- [mlizhang@gmail.com](mailto:mlizhang@gmail.com)
