Metadata-Version: 2.4
Name: peptcrnet
Version: 1.0.1
Summary: A Deep Learning Framework for TCR-Peptide Recognition Prediction
Home-page: https://github.com/mlizhangx/Pep-TCRNet
Author: PepTCRNet Team
Author-email: mlizhang@gmail.com
Project-URL: Bug Reports, https://github.com/mlizhangx/Pep-TCRNet/issues
Project-URL: Source, https://github.com/mlizhangx/Pep-TCRNet
Project-URL: Documentation, https://peptcrnet.readthedocs.io
Keywords: TCR peptide recognition deep-learning bioinformatics immunology
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.8,<3.13
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy<2.0.0,>=1.24.0
Requires-Dist: pandas<3.0.0,>=2.0.0
Requires-Dist: scikit-learn<2.0.0,>=1.3.0
Requires-Dist: scipy<2.0.0,>=1.10.0
Requires-Dist: tensorflow<3.0.0,>=2.13.0
Requires-Dist: tf-keras>=2.13.0
Requires-Dist: tensorflow-probability[tf]<1.0.0,>=0.21.0
Requires-Dist: matplotlib<4.0.0,>=3.7.0
Requires-Dist: seaborn<1.0.0,>=0.12.0
Requires-Dist: networkx<4.0.0,>=2.8.0
Requires-Dist: stellargraph>=1.2.0
Requires-Dist: python-Levenshtein>=0.21.0
Requires-Dist: umap-learn>=0.5.0
Requires-Dist: hdbscan>=0.8.0
Requires-Dist: tqdm>=4.65.0
Requires-Dist: natsort>=8.0.0
Requires-Dist: joblib>=1.3.0
Requires-Dist: jupyter>=1.0.0
Requires-Dist: ipywidgets>=8.0.0
Requires-Dist: notebook>=6.5.0
Provides-Extra: dev
Requires-Dist: pytest>=6.0; extra == "dev"
Requires-Dist: pytest-cov>=2.0; extra == "dev"
Requires-Dist: black>=21.0; extra == "dev"
Requires-Dist: flake8>=3.8; extra == "dev"
Requires-Dist: sphinx>=4.0; extra == "dev"
Requires-Dist: sphinx-rtd-theme>=0.5; extra == "dev"
Provides-Extra: viz
Requires-Dist: seaborn>=0.12.0; extra == "viz"
Requires-Dist: matplotlib>=3.7.0; extra == "viz"
Requires-Dist: plotly>=5.0; extra == "viz"
Provides-Extra: gpu
Requires-Dist: tensorflow-gpu>=2.13.0; extra == "gpu"
Dynamic: author
Dynamic: author-email
Dynamic: classifier
Dynamic: description
Dynamic: description-content-type
Dynamic: home-page
Dynamic: keywords
Dynamic: license-file
Dynamic: project-url
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# PepTCRNet: Deep Learning for TCR-Peptide Recognition Prediction

<p align="center">
  <img src="figures/Pipeline.jpg" alt="PepTCRNet Pipeline" width="600"/>
</p>

[![Python 3.8+](https://img.shields.io/badge/python-3.8--3.12-blue.svg)](https://www.python.org/downloads/)
[![TensorFlow 2.13+](https://img.shields.io/badge/tensorflow-2.13+-orange.svg)](https://www.tensorflow.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)







**PepTCRNet** is a state-of-the-art deep learning framework for
predicting T-cell receptor (TCR) recognition of peptide antigens. It
combines advanced neural network architectures with comprehensive
feature engineering to achieve high-accuracy predictions with
uncertainty quantification.

## 🌟 Key Features

- **Multi-modal Integration**: Seamlessly combines sequence,
categorical, and network-based features
- **Advanced Embeddings**: Utilizes autoencoders, position encoding, and
Atchley factors for sequence representation
- **Bayesian Neural Networks**: Provides uncertainty quantification for
predictions
- **Comprehensive Pipeline**: End-to-end solution from data
preprocessing to model deployment
- **Flexible Architecture**: Modular design allows easy customization
and extension
- **Class Imbalance Handling**: Built-in support for imbalanced datasets
- **Rich Visualizations**: Extensive plotting utilities for model
interpretation

## 🚀 Quick Start

### Run the Complete Demo (Easiest!)

```bash
# One-click demo launcher
./run_demo.sh
```

This launches the complete **Scenario 17** demo using all features!

### Installation

#### From PyPI (recommended)

```bash
pip install peptcrnet
```

**Requirements:** Python 3.8–3.12 (Python 3.13 is not supported due to dependency constraints).

#### From source (development)

```bash
git clone https://github.com/mlizhangx/Pep-TCRNet.git
cd Pep-TCR-Net
pip install -e .

# Run the demo
jupyter notebook DEMO_Complete_Pipeline.ipynb
```

### Basic Usage

```python
import peptcrnet
from peptcrnet import PepTCRNetPipeline

# Initialize pipeline
pipeline = PepTCRNetPipeline(data_path='your_data.csv')

# Load and prepare data
pipeline.load_data()
pipeline.split_data(test_size=0.2, val_size=0.1)

# Prepare features
pipeline.prepare_features(feature_types=['sequences', 'categorical'])

# Train model
history = pipeline.train(epochs=100, batch_size=128)

# Evaluate with uncertainty
results = pipeline.evaluate_with_uncertainty(n_samples=200)

# Make predictions
predictions = pipeline.predict(new_data)
```

## 📊 Data Format

PepTCRNet expects input data in CSV format with the following columns:


| Column    | Description                     | Example         |
| --------- | ------------------------------- | --------------- |
| `CDR3`    | TCR CDR3β sequence              | `CASSRGQGNEQFF` |
| `Peptide` | Peptide sequence or class label | `GILGFVFTL`     |
| `V`       | V gene segment                  | `TRBV7-2`       |
| `J`       | J gene segment                  | `TRBJ2-1`       |
| `HLA-A`   | HLA-A allele                    | `A*02:01`       |
| `HLA-B`   | HLA-B allele                    | `B*07:02`       |
| `HLA-C`   | HLA-C allele                    | `C*07:01`       |


## 🧪 Demo Notebook

Try our interactive demo notebook to see PepTCRNet in action:

```bash
jupyter notebook demo_pipeline.ipynb
```

The demo includes: - Sample data generation - Step-by-step pipeline
walkthrough - Model training and evaluation - Uncertainty
quantification - Visualization examples

## 📚 Documentation

### Pipeline Components

#### 1. Data Loading and Preprocessing

```python
from peptcrnet.data import DataLoader

loader = DataLoader('data.csv', atchley_path='atchley_factors.txt')
stats = loader.get_summary_stats()
splits = loader.split_data()
```

#### 2. Feature Engineering

```python
from peptcrnet.embeddings import SequenceEmbedder, CategoricalEmbedder

# Sequence embeddings
seq_embedder = SequenceEmbedder(atchley_factors, max_length=30)
tcr_embeddings = seq_embedder.encode_sequences(tcr_sequences)

# Categorical embeddings
cat_embedder = CategoricalEmbedder()
cat_embeddings = cat_embedder.encode_features(categorical_data)
```

#### 3. Model Training

```python
from peptcrnet.models import BayesianClassifier

model = BayesianClassifier(
    input_shapes={'sequences': (100,), 'categorical': (50,)},
    num_classes=5,
    hidden_dims=[512, 256, 64]
)

history = model.train(X_train, y_train, X_val, y_val)
```

#### 4. Evaluation and Visualization

```python
from peptcrnet.evaluation import ModelEvaluator
from peptcrnet.visualization import plot_confusion_matrix, plot_roc_curves

evaluator = ModelEvaluator()
metrics = evaluator.compute_metrics(y_true, y_pred, y_proba)

plot_confusion_matrix(y_true, y_pred)
plot_roc_curves(y_true, y_proba)
```

## ⚙️ Configuration

PepTCRNet uses a centralized configuration system:

```python
from peptcrnet import config

# Access configuration
print(config.ModelParams.MAX_TCR_LENGTH)
print(config.TrainingParams.BATCH_SIZE)

# Save configuration
config.save_config('my_config.json')

# Load configuration
config.load_config('my_config.json')
```

## 🔬 Advanced Features

### Uncertainty Quantification

PepTCRNet provides Bayesian uncertainty estimation:

```python
# Multiple forward passes for uncertainty
predictions, uncertainty = pipeline.predict_with_uncertainty(
    test_data,
    n_samples=200
)

# Identify high-confidence predictions
high_confidence_mask = uncertainty < threshold
```

### Custom Feature Combinations

Experiment with different feature combinations:

```python
# Define feature cases
feature_cases = {
    1: ['TCR'],
    2: ['TCR', 'Peptide'],
    3: ['TCR', 'Peptide', 'HLA'],
    4: ['TCR', 'Peptide', 'HLA', 'VJ', 'Network']
}

# Train with specific features
pipeline.prepare_features(feature_types=feature_cases[3])
```

### Model Persistence

Save and load trained models:

```python
# Save complete pipeline
pipeline.save_pipeline('output_dir/')

# Load saved pipeline
new_pipeline = PepTCRNetPipeline()
new_pipeline.load_pipeline('output_dir/')
```

## 📈 Performance

PepTCRNet achieves state-of-the-art performance on TCR-peptide binding
prediction:

- **Accuracy**: Up to 95% on benchmark datasets
- **AUC-ROC**: 0.90 for multi-class classification
- **Uncertainty Calibration**: Well-calibrated confidence scores

## 🤝 Contributing

We welcome contributions! Please see our [Contributing
Guidelines](CONTRIBUTING.md) for details.

```bash
# Fork the repository
# Create your feature branch
git checkout -b feature/amazing-feature

# Commit your changes
git commit -m 'Add amazing feature'

# Push to the branch
git push origin feature/amazing-feature

# Open a Pull Request
```

## 📝 Citation

If you use PepTCRNet in your research, please cite:

```bibtex
@article{le2025peptcrnet,
  title={PepTCR-Net: prediction of multi-class antigen peptides by T-cell receptor sequences with deep learning},
  author={Le, Phi and Ung, Leah and Yang, Hai and Huang, Anwen and He, Tao and Bruno, Peter and Oh, David Y and Keenan, Bridget P and Zhang, Li},
  journal={Briefings in Bioinformatics},
  volume={26},
  number={4},
  pages={bbaf351},
  year={2025},
  doi={10.1093/bib/bbaf351},
  url={https://doi.org/10.1093/bib/bbaf351}
}
```

## 📄 License

This project is licensed under the MIT License - see the
[LICENSE](LICENSE) file for details.

## 📮 Contact

- **Issues**: [GitHub
Issues](https://github.com/mlizhangx/Pep-TCRNet/issues)
- **Discussions**: [GitHub
Discussions](https://github.com/mlizhangx/Pep-TCRNet/discussions)
- **Email**: [mlizhang@gmail.com](mailto:mlizhang@gmail.com)

## 🗺️ Roadmap

- [ ] Support for TCRα chains
- [ ] Integration with single-cell RNA-seq data
- [ ] Web interface for predictions
- [ ] Pre-trained models for common peptides
- [ ] GPU optimization for large-scale predictions
- [ ] Docker containerization

---



Made with ❤️ by the PepTCRNet Team

