Metadata-Version: 2.4
Name: hairpred2
Version: 2.0.0
Summary: Structure-based prediction of antibody-interacting residues in human antigens
Home-page: https://webs.iiitd.edu.in/raghava/hairpred2
Author: Naman Mehta, Raghava Lab
Author-email: Naman Mehta <namanm@iiitd.ac.in>, Raghava Lab <raghava@iiitd.ac.in>
License: MIT
Project-URL: Web Server, https://webs.iiitd.edu.in/raghava/hairpred2
Project-URL: Source, https://github.com/raghavagps/hairpred2
Project-URL: Bug Reports, https://github.com/raghavagps/hairpred2/issues
Project-URL: Lab Website, https://webs.iiitd.edu.in/raghava/
Keywords: antibody,epitope,prediction,structure,bioinformatics,immunoinformatics
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.21
Requires-Dist: pandas>=1.3
Requires-Dist: joblib>=1.0
Requires-Dist: gemmi>=0.5
Requires-Dist: biopython>=1.79
Requires-Dist: scipy>=1.7
Dynamic: author
Dynamic: home-page
Dynamic: license-file
Dynamic: requires-python

# HAIRpred2

**HAIRpred2** is a structure-based computational tool for predicting antibody-interacting residues in human antigen structures. It uses Relative Solvent Accessibility (RSA) combined with physicochemical properties in a sliding window framework with a pre-trained Random Forest model (AUC = 0.78).

[![PyPI version](https://badge.fury.io/py/hairpred2.svg)](https://badge.fury.io/py/hairpred2)
[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

🌐 **Web Server**: https://webs.iiitd.edu.in/raghava/hairpred2

---

## Installation

```bash
pip install hairpred2
```

### System dependency — mkdssp (required)

HAIRpred2 uses DSSP to compute RSA values from the PDB structure. Install `mkdssp`:

```bash
# Linux / Ubuntu
sudo apt install dssp

# Conda (recommended — any platform)
conda install -c salilab dssp

# macOS (Homebrew)
brew install dssp

# Google Colab
apt-get install -y dssp
```

### Download the model file

The pre-trained model (~253 MB) is not bundled with the pip package. Download it separately:

```bash
# Download to the package data directory automatically
hairpred2-download-model

# Or download manually and place in your working directory:
# https://webs.iiitd.edu.in/raghava/hairpred2/download/best_model_random_forest.pkl
```

---

## Usage

### Command line

```bash
# Basic usage
hairpred2 -i antigen.pdb -c A

# Custom output prefix
hairpred2 -i antigen.pdb -c A -o my_results

# Multiple antigen chains
hairpred2 -i antigen.pdb -c A,B

# Filter buried residues (recommended)
hairpred2 -i antigen.pdb -c A --min-rsa 0.05

# Custom probability threshold
hairpred2 -i antigen.pdb -c A -t 0.4

# Use a custom model file
hairpred2 -i antigen.pdb -c A --model /path/to/model.pkl
```

### Python API

```python
from hairpred2 import run_pipeline

# Basic prediction
results = run_pipeline(
    pdb_file      = "antigen.pdb",
    chain_ids     = ["A"],
    output_prefix = "my_results",
)

# With options
results = run_pipeline(
    pdb_file      = "antigen.pdb",
    chain_ids     = ["A", "B"],
    output_prefix = "my_results",
    threshold     = 0.4,
    min_rsa       = 0.05,
    model_path    = "/path/to/best_model_random_forest.pkl",
)

# results is a pandas DataFrame with columns:
# Residue, RSA, Probability, Prediction
print(results.head())
```

### Individual functions

```python
from hairpred2 import (
    validate_pdb,
    build_residue_dataframe,
    generate_features,
    load_model,
    predict_residues,
    detect_epitope_patches,
)

# Load and validate
validate_pdb("antigen.pdb")

# Build features
df, temp_pdb = build_residue_dataframe("antigen.pdb", ["A"])
X = generate_features(df)

# Predict
model = load_model()          # uses bundled/downloaded model
# model = load_model("/custom/path/model.pkl")  # custom model
probs, labels = predict_residues(model, X, threshold=0.5)

# Detect epitope patches
patches = detect_epitope_patches(df, labels, probs)
```

---

## Output Files

Every prediction generates 5 output files (all sharing the same prefix):

| File | Description |
|------|-------------|
| `<prefix>.csv` | Per-residue: Residue, RSA, Probability, Prediction |
| `<prefix>_summary.txt` | Statistics + top 10 predicted residues |
| `<prefix>_bfactor.pdb` | PDB with B-factor = probability × 100 |
| `<prefix>.pml` | PyMOL script — colors red/blue + residue labels |
| `<prefix>_patches.txt` | Spatially clustered epitope patches |

### Visualization in PyMOL

```
# Run in PyMOL
@my_results.pml

# Or color by probability using B-factor PDB
load my_results_bfactor.pdb
spectrum b, blue_white_red
```

---

## Arguments

| Argument | Required | Description |
|----------|----------|-------------|
| `-i` / `--input` | Yes | Input antigen PDB file |
| `-c` / `--chain` | Yes | Chain ID(s): `A` or `A,B` |
| `-o` / `--output` | No | Output prefix (default: `hairpred2_results`) |
| `-t` / `--threshold` | No | Probability threshold (default: `0.5`) |
| `--min-rsa` | No | Minimum RSA filter (e.g. `0.05`) |
| `--model` | No | Custom model `.pkl` path |

---

## Model Performance

Trained on 221 human Ag-Ab complexes (SAbDab), tested on 56 independent complexes:

| Metric | Value |
|--------|-------|
| AUC | 0.78 |
| Sensitivity | 0.73 |
| Specificity | 0.65 |

---

## Citation

If you use HAIRpred2, please cite:

> Mehta N., et al. (2026) HAIRpred2: Human Host-Specific Prediction of Antibody-Interacting Residues Using Hybrid Physicochemical and Structural Features. *(manuscript in preparation)*

Previous tool:
> Sahni R., Kumar N. and Raghava GPS (2025) HAIRpred: Prediction of human antibody interacting residues in an antigen from its primary structure. [Protein Sci, 34(8):e70212](https://doi.org/10.1002/pro.70212)

---

## License

MIT License — free for academic and non-commercial use.

© 2025 Raghava Lab, IIIT Delhi — https://webs.iiitd.edu.in/raghava/
