Metadata-Version: 2.4
Name: phospy
Version: 1.2.3
Summary: Python native implementation of selected PhosR style workflows for phosphoproteomics.
License-Expression: GPL-3.0
Project-URL: Homepage, https://phospy.com
Project-URL: Repository, https://github.com/falconsmilie/phospy
Project-URL: Documentation, https://github.com/falconsmilie/phospy/tree/main/docs
Project-URL: Issues, https://github.com/falconsmilie/phospy/issues
Project-URL: Changelog, https://github.com/falconsmilie/phospy/blob/main/CHANGELOG.md
Keywords: phosphoproteomics,proteomics,bioinformatics,phosphorylation,phosphosite,kinase,ksea,mass-spectrometry,phosr
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE.md
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: pydantic>=2.0
Requires-Dist: scikit-learn>=1.4
Requires-Dist: scipy>=1.10
Provides-Extra: parquet
Requires-Dist: pyarrow>=15.0; extra == "parquet"
Provides-Extra: test
Requires-Dist: pytest>=8.0; extra == "test"
Provides-Extra: dev
Requires-Dist: pre-commit>=4.0; extra == "dev"
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: ruff>=0.15.7; extra == "dev"
Dynamic: license-file

# PhosPy

PhosPy is a focused Python library for selected phosphoproteomics workflows inspired by `PhosR`.

It is built for a small set of jobs:

- preprocess total and phospho tables
- analyse kinase activity from an existing `predMat`
- generate a `predMat` from phosphosite inputs
- run the native Python kinase workflow
- construct signalomes from scoring and prediction outputs

PhosPy is intentionally narrow. It is not a full `PhosR` replacement.

## Install

PhosPy supports Python 3.10 and newer.

```bash
pip install phospy
```

For parquet output:

```bash
pip install "phospy[parquet]"
```

The examples below use repository paths such as `examples/data/...`. If you installed from PyPI, use your own local paths instead.

## Pick the Right Entry Point

### `PhosphoDataset`

Use `PhosphoDataset` when you want validated total and phospho inputs plus the standard preprocessing flow.

```python
from phospy import PhosphoDataset
from phospy.writers import CoreOutputWriter

dataset = PhosphoDataset.from_files(
    "examples/data/total.tsv",
    "examples/data/phospho.tsv",
    phospho_encoding="utf-16le",
)
core = dataset.preprocessing.run(max_unmatched_fraction=0.1)

CoreOutputWriter().write(core, outdir="examples/output", format="csv")

site_matrix = core.site_matrix.matrix
corrected = core.phospho_corrected
```

### `KinaseActivityAnalyzer`

Use `KinaseActivityAnalyzer` when you already have a phosphosite matrix and a `predMat`.

```python
from phospy import KinaseActivityAnalyzer, PhosphoDataset

dataset = PhosphoDataset.from_files(
    "examples/data/total.tsv",
    "examples/data/phospho.tsv",
    phospho_encoding="utf-16le",
)
core = dataset.preprocessing.run(max_unmatched_fraction=0.1)

analyzer = KinaseActivityAnalyzer()
result = analyzer.run(
    pred_mat=analyzer.load_pred_mat("examples/data/predMat.csv"),
    phospho_matrix=core.site_matrix.matrix,
    threshold=0.6,
    min_substrates=1,
    top_n_substrates=1,
)

ksea_scores = result.ksea_scores
```

The bundled example data is tiny, so it uses `min_substrates=1` and `top_n_substrates=1`.

### `PhosRPipeline`

Use `PhosRPipeline` when you want file loading, preprocessing, optional kinase analysis, and output publishing in one call.

```python
from phospy import PhosRPipeline

pipeline = PhosRPipeline.from_files(
    total_path="examples/data/total.tsv",
    phospho_path="examples/data/phospho.tsv",
    pred_mat_path="examples/data/predMat.csv",
    phospho_encoding="utf-16le",
    max_unmatched_fraction=0.1,
    kinase_activity_threshold=0.6,
    kinase_activity_min_substrates=1,
    kinase_activity_top_n_substrates=1,
)
outputs = pipeline.run(outdir="examples/output")
```

When `outdir` is set, the pipeline writes the core outputs, any kinase-analysis outputs, and `run_manifest.json`.

### `PredMatWorkflow`

Use `PredMatWorkflow` when your goal is to generate a `predMat` from phosphosite inputs.

```python
import json
from pathlib import Path

import pandas as pd

from phospy import PredMatWorkflow

phospho_matrix = pd.read_csv("examples/data/predmat_phospho_matrix.csv", index_col=0)
site_sequences = json.loads(Path("examples/data/predmat_site_sequences.json").read_text())
substrate_map = json.loads(Path("examples/data/predmat_substrate_map.json").read_text())
motif_sequences = json.loads(Path("examples/data/predmat_motif_sequences.json").read_text())

workflow = PredMatWorkflow(flank_size=2, svm_mode="default")
result = workflow.run(
    phospho_matrix=phospho_matrix,
    substrate_map=substrate_map,
    site_sequences=site_sequences,
    motif_sequences=motif_sequences,
    min_substrates=2,
    min_motif_size=2,
    ensemble_size=3,
    top=4,
    score_threshold=0.75,
    inclusion=3,
    n_iterations=2,
    random_state=17,
)

pred_mat = result.pred_mat_result.to_frame(copy=False)
result.pred_mat_result.to_csv("predMat.csv")
```

Use `svm_mode="default"` for the recommended stable native path. Use `svm_mode="r_parity"` when you want the supported parity-oriented learner, sampling, and final-scoring preset for parity-sensitive comparisons.

When thresholds are too strict and no kinase candidates qualify, PhosPy raises `NoCandidateKinasesError` instead of returning an empty invalid `predMat`.

A runnable repository example lives in [`examples/predmat_workflow_demo.py`](examples/predmat_workflow_demo.py).

### `KinaseWorkflow`

Use `KinaseWorkflow` when you want the fuller native Python scoring and prediction workflow, including intermediate profile and motif scoring outputs.

A runnable repository example lives in [`examples/native_workflow_demo.py`](examples/native_workflow_demo.py).

From a repository checkout:

```bash
make native-workflow-demo
```

### `SignalomeWorkflow`

Use `SignalomeWorkflow` when you already have scoring and prediction outputs and want downstream signalome, map-ready, and network-ready outputs.

```python
from phospy import PredMatWorkflow, SignalomeWorkflow

pred_mat_result = PredMatWorkflow(flank_size=2, svm_mode="default").run(...)
signalome_result = SignalomeWorkflow().run(
    scoring_result=pred_mat_result.scoring_result,
    prediction_result=pred_mat_result.prediction_result,
    expression_matrix=phospho_matrix,
    kinases_of_interest=["KINASE_A", "KINASE_B"],
    signalome_cutoff=0.5,
)

map_data = signalome_result.to_map_data()
network_data = signalome_result.to_network_data()
```

Use `signalome_result.to_csv(...)`, `map_data.to_csv(...)`, and `network_data.to_csv(...)` when you want exportable tables.

A runnable repository example lives in [`examples/signalome_workflow_demo.py`](examples/signalome_workflow_demo.py).

## File Inputs

PhosPy works with:

- total input as TSV
- phospho input as TSV
- `predMat` as CSV, with the first column used as the phosphosite index

For the default table schema and method-level validation rules, see [`docs/api.md`](docs/api.md) and [`docs/validation.md`](docs/validation.md).

## CLI

PhosPy also ships with a small CLI for the file-based preprocessing path and optional `predMat` analysis.

```bash
phospy \
  --total examples/data/total.tsv \
  --phospho examples/data/phospho.tsv \
  --pred-mat examples/data/predMat.csv \
  --phospho-encoding utf-16le \
  --max-unmatched-fraction 0.1 \
  --outdir examples/output
```

## Read Next

- [`docs/api.md`](docs/api.md) for the public Python API and CLI options
- [`docs/validation.md`](docs/validation.md) for the validation checklist
- [`docs/parity.md`](docs/parity.md) for parity scope, supported prediction modes, and release thresholds
- [`docs/fixtures.md`](docs/fixtures.md) for fixture and trace rebuild commands
