Metadata-Version: 2.4
Name: protfunc
Version: 0.1.0
Summary: Python SDK for the POOL / GRASP-Func protein functional site prediction server
Author: Avaneeth Anil
License: MIT
Project-URL: Homepage, https://github.com/avaneeth-anil/protfunc
Project-URL: Documentation, https://github.com/avaneeth-anil/protfunc#readme
Project-URL: Repository, https://github.com/avaneeth-anil/protfunc
Project-URL: Bug Tracker, https://github.com/avaneeth-anil/protfunc/issues
Keywords: bioinformatics,protein,functional-sites,POOL,GRASP-Func
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: requests>=2.28
Dynamic: license-file

# protfunc

Python SDK for the **POOL / GRASP-Func** protein functional site prediction server, developed by the [Ondrechen Research Group](https://theorg.sites.northeastern.edu/) at Northeastern University.

POOL (Partial Order Optimum Likelihood) identifies functionally important residues in proteins. GRASP-Func (Graph Representation of Active Sites for the Prediction of Function) classifies protein function by comparing local active-site structures.

## Installation

```bash
pip install protfunc
```

## Quick Start

```python
from protfunc import Client

client = Client()

# Run POOL on a directory of .pdb files
results = client.run_pool("./structures/", output_dir="./pool_output/")
```

## Usage

### POOL

Provide a directory of `.pdb` files (or a pre-made `.zip`):

```python
from protfunc import Client

client = Client()
results = client.run_pool("./my_pdb_files/")
# results is a Path pointing to the extracted output directory
```

### GRASP-Func — Step by Step

Run each stage independently. The output of each step feeds into the next:

```python
# Step 1: Pre-processing
preprocessed = client.preprocess("./input/", output_dir="./pre/")

# Step 2: Processing (graph matching)
processed = client.process(preprocessed, output_dir="./proc/")

# Step 3: Visualization
viz = client.visualize(processed, known_proteins=["1a2b", "3c4d"], output_dir="./viz/")
```

### GRASP-Func — Full Pipeline

Or run everything in a single call:

```python
results = client.run_graspfunc(
    "./input/",
    known_proteins=["1a2b", "3c4d"],
    output_dir="./results/",
)
```

### Extract Protein Names

Pull the list of proteins from result files (useful for populating the `known_proteins` argument):

```python
proteins = client.extract_proteins("./processed/")
# ['protein_a', 'protein_b', ...]
```

### Server Health Check

```python
if client.ping():
    print("Server is up")
```

## Progress Callbacks

By default, the SDK prints progress to the console:

```
[1/3] preprocess: starting...
[1/3] preprocess: uploading (2.3 MB)...
[1/3] preprocess: running...
[1/3] preprocess: ✓ done (14.2s)
[2/3] process: starting...
...
```

### Custom Callback

Receive structured `ProgressEvent` objects:

```python
from protfunc import Client, ProgressEvent

def my_callback(event: ProgressEvent):
    print(f"{event.stage} → {event.status.value} ({event.elapsed:.1f}s)")

client = Client(on_progress=my_callback)
client.run_pool("./structures/")
```

### Silent Mode

```python
client = Client(base_url="...", on_progress=False)
```

### Per-Call Override

```python
client.run_pool("./structures/", on_progress=my_callback)
client.preprocess("./input/", on_progress=False)  # silent for this call only
```

## Input Formats

| Method | Expected Input |
|---|---|
| `run_pool()` | Directory of `.pdb` files or a `.zip` containing them |
| `preprocess()` | ZIP or directory with `input/family/g*` structure containing `.pdb` and `.tcranks`/`.TICranks` files |
| `process()` | Output of `preprocess()` — contains `.pdb`, `.rank`, `.qhull`, `.sc`, and `.tcranks` files |
| `visualize()` | Output of `process()` — contains `results_inter.txt` and `results_intra.txt` |

## Configuration

| Parameter | Default | Description |
|---|---|---|
| `base_url` | `http://localhost:8000` | Server URL |
| `timeout` | `1800` (30 min) | Request timeout in seconds |
| `on_progress` | Console output | Progress callback, `False` for silent |

## Citations

If you use this tool in your research, please cite:

> Somarowthu, S. & Ondrechen, M. J. (2012). POOL server: machine learning application for functional site prediction in proteins. *Bioinformatics*, 28(15), 2078–2079.

> Mills, C. L., Garg, R., Lee, J. S., Tian, L., Suciu, A., Cooperman, G. D., Beuning, P. J., & Ondrechen, M. J. (2018). Functional classification of protein structures by local structure matching in graph representation. *Protein Science*, 27(6), 1125–1135.

## License

MIT
