Metadata-Version: 2.4
Name: netComplex
Version: 0.1.0
Summary: Protein-complex activity scoring via network propagation (RWR on sample-specific PPI networks)
Author-email: Zihao Chen <xiaoinsland@gmail.com>
License-Expression: LicenseRef-Research-Only
Project-URL: Homepage, https://github.com/xiaoinsland/netComplex
Project-URL: Issues, https://github.com/xiaoinsland/netComplex/issues
Keywords: bioinformatics,proteomics,network,protein complex,random walk
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.21
Requires-Dist: pandas>=1.3
Requires-Dist: networkx>=2.6
Requires-Dist: scipy>=1.7

# netComplex

**netComplex** scores protein-complex activity across samples by propagating gene expression signals through a sample-specific protein–protein interaction (PPI) network via Random Walk with Restart (RWR).

## Algorithm overview

1. **Filter** — retain only genes present in both the PPI network and the expression matrix.
2. **Rank-normalize** — convert each sample's expression values to ranks in (0, 1].
3. **Re-weight edges** — scale PPI edge scores by the expression ranks of both endpoints: `weight = BaseScore × (rank_u × rank_v)^β`.
4. **RWR** — propagate the ranked expression as a restart signal over the weighted network to obtain node-level scores.
5. **Complex scoring** — for each complex aggregate node scores into:
   - **BaseActivity** — expression-weighted mean of propagated node scores.
   - **Coherence** — mean pairwise score similarity across retained members × size penalty `k/(k+1)`.
   - **ComplexScore** = BaseActivity × Coherence.

## Installation

```bash
pip install netComplex
```

## Quick start

```python
import pandas as pd
from netcomplex import compute_all_sample_complex_scores, build_complex_score_matrix

# links: DataFrame with columns protein1, protein2, score
# expression_data: DataFrame, genes as index, samples as columns
# complex_data: DataFrame with columns Complex, Genes (semicolon-delimited)

scores = compute_all_sample_complex_scores(
    links=links,
    expression_data=expression_data,
    complex_data=complex_data,
    rwr_alpha=0.3,   # restart probability
    beta=0.5,        # edge re-weighting exponent
    gamma=1.0,       # coherence pairwise-similarity exponent
)

# Long-format table (one row per sample-complex pair)
print(scores.head())

# Wide-format matrix (complexes × samples)
matrix = build_complex_score_matrix(scores)
```

## API reference

| Function | Description |
|---|---|
| `prepare_data(links, expression_data)` | Filter shared genes and build base PPI graph. |
| `run_rwr(sample_id, G, real_original_weights, expression_data, ...)` | Re-weight edges and run RWR for one sample. |
| `compute_complex_score(sample_id, p_star, expression_ranked, complex_data, ...)` | Score all complexes for one sample. |
| `compute_all_sample_complex_scores(links, expression_data, complex_data, ...)` | Run the full pipeline across all samples. |
| `build_complex_score_matrix(all_complex_scores)` | Pivot long-format scores to a complex × sample matrix. |
| `rank_normalize_series(sample_expression)` | Rank-normalize a single expression series to (0, 1]. |

## Input formats

**`links`** — PPI network:

| protein1 | protein2 | score |
|---|---|---|
| EGFR | ERBB2 | 0.9 |

**`expression_data`** — gene × sample matrix (genes as row index, samples as column names).

**`complex_data`** — complex membership:

| Complex | Genes |
|---|---|
| COP9 signalosome | COPS1;COPS2;COPS3 |

## License

MIT
