Metadata-Version: 2.4
Name: phospy
Version: 1.3.0
Summary: Python native implementation of selected PhosR style workflows for phosphoproteomics.
License-Expression: GPL-3.0
Project-URL: Homepage, https://phospy.com
Project-URL: Repository, https://github.com/falconsmilie/phospy
Project-URL: Documentation, https://github.com/falconsmilie/phospy/tree/main/docs
Project-URL: Issues, https://github.com/falconsmilie/phospy/issues
Project-URL: Changelog, https://github.com/falconsmilie/phospy/blob/main/CHANGELOG.md
Keywords: phosphoproteomics,proteomics,bioinformatics,phosphorylation,phosphosite,kinase,ksea,mass-spectrometry,phosr
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE.md
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: pydantic>=2.0
Requires-Dist: scikit-learn>=1.4
Requires-Dist: scipy>=1.10
Provides-Extra: parquet
Requires-Dist: pyarrow>=15.0; extra == "parquet"
Provides-Extra: test
Requires-Dist: pytest>=8.0; extra == "test"
Provides-Extra: dev
Requires-Dist: pre-commit>=4.0; extra == "dev"
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: ruff>=0.15.7; extra == "dev"
Dynamic: license-file

# PhosPy

PhosPy is a focused Python library for a small, supported set of phosphoproteomics workflows inspired by `PhosR`.

It helps you do five things:

- preprocess total and phospho tables
- analyse kinase activity from an existing `predMat`
- generate a `predMat` from phosphosite inputs
- run the native Python kinase workflow
- build downstream signalome outputs

PhosPy is intentionally narrow. It is **not** a full `PhosR` replacement.

## Install

PhosPy supports Python 3.10 and newer.

```bash
pip install phospy
```

For parquet output:

```bash
pip install "phospy[parquet]"
```

Repository examples use paths like `examples/data/...`. If you installed from PyPI, point those examples at your own files instead.

## Start Here

Pick the lane that matches your job.

### Common Lane: `SimpleKinaseWorkflow`

Use this when you already have a phospho table, know the species, and want one supported end-to-end entry point.

```python
from phospy.api import SimpleKinaseWorkflow

result = SimpleKinaseWorkflow().run(
    phospho="study_phospho.tsv",
    total="study_total.tsv",
    species="rat",
    reference="auto",
)

pred_mat = result.pred_mat_result.to_frame()
weighted_activity = result.kinase_activity_result.weighted_activity
```

This lane handles:

- preprocessing
- the analysis-ready phosphosite boundary
- bundled reference resolution
- `predMat` generation
- kinase activity analysis

Today, the bundled reference lane is intentionally small:

- supported species: `rat`
- supported references: `auto`, `l6`, `l6_native`
- `auto` currently resolves to `l6_native`

Runnable example: [`examples/simple_workflow_demo.py`](examples/simple_workflow_demo.py)

### Advanced Lane: Native Workflow Pieces

Use this when you need direct control over workflow-shaped inputs such as:

- `site_sequences`
- `substrate_map`
- `motif_sequences`
- `ReferenceBundle`
- intermediate scoring and prediction outputs
- signalome construction

Main entry points:

- `PhosphoDataset`
- `AnalysisReadyPhosphoDataset`
- `ReferenceBundle`
- `PredMatWorkflow`
- `KinaseWorkflow`
- `SignalomeWorkflow`

Runnable examples:

- [`examples/native_workflow_demo.py`](examples/native_workflow_demo.py)
- [`examples/predmat_workflow_demo.py`](examples/predmat_workflow_demo.py)
- [`examples/signalome_workflow_demo.py`](examples/signalome_workflow_demo.py)

## Other Useful Entry Points

### `PhosphoDataset`

Use this when you want validated total and phospho inputs plus the standard preprocessing flow.

```python
from phospy.datasets import PhosphoDataset
from phospy.io.writers import CoreOutputWriter

dataset = PhosphoDataset.from_files(
    "examples/data/total.tsv",
    "examples/data/phospho.tsv",
    phospho_encoding="utf-16le",
)
core = dataset.preprocessing.run(max_unmatched_fraction=0.1)

CoreOutputWriter().write(core, outdir="examples/output", format="csv")

site_matrix = core.site_matrix.matrix
corrected = core.phospho_corrected
```

### `KinaseActivityAnalyzer`

Use this when you already have a phosphosite matrix and a `predMat`.

```python
from phospy.activities import KinaseActivityAnalyzer
from phospy.datasets import PhosphoDataset

dataset = PhosphoDataset.from_files(
    "examples/data/total.tsv",
    "examples/data/phospho.tsv",
    phospho_encoding="utf-16le",
)
core = dataset.preprocessing.run(max_unmatched_fraction=0.1)

analyzer = KinaseActivityAnalyzer()
result = analyzer.run(
    pred_mat=analyzer.load_pred_mat("examples/data/predMat.csv"),
    phospho_matrix=core.site_matrix.matrix,
    threshold=0.6,
    min_substrates=1,
    top_n_substrates=1,
)
```

The bundled example data is tiny, so it uses `min_substrates=1` and `top_n_substrates=1`.

### `PhosRPipeline`

Use this when you want file loading, preprocessing, optional kinase analysis, and output publishing in one call.

```python
from phospy.pipeline import PhosRPipeline

pipeline = PhosRPipeline.from_files(
    total_path="examples/data/total.tsv",
    phospho_path="examples/data/phospho.tsv",
    pred_mat_path="examples/data/predMat.csv",
    phospho_encoding="utf-16le",
    max_unmatched_fraction=0.1,
    kinase_activity_threshold=0.6,
    kinase_activity_min_substrates=1,
    kinase_activity_top_n_substrates=1,
)
outputs = pipeline.run(outdir="examples/output")
```

When `outdir` is set, the pipeline writes core outputs, optional kinase-analysis outputs, and `run_manifest.json`.

## File Inputs

PhosPy works with:

- total input as TSV
- phospho input as TSV
- `predMat` as CSV, with the first column used as the phosphosite index

For required columns and common validation rules, see [`docs/validation.md`](docs/validation.md).

## CLI

PhosPy also ships with a small CLI for file-based preprocessing and optional `predMat` analysis.

```bash
phospy \
  --total examples/data/total.tsv \
  --phospho examples/data/phospho.tsv \
  --pred-mat examples/data/predMat.csv \
  --phospho-encoding utf-16le \
  --max-unmatched-fraction 0.1 \
  --outdir examples/output
```

## Read Next

- [`docs/api.md`](docs/api.md) for the supported Python API
- [`docs/validation.md`](docs/validation.md) for input rules and common failures
- [`docs/parity.md`](docs/parity.md) for parity scope and `svm_mode`
- [`docs/fixtures.md`](docs/fixtures.md) for fixture and trace rebuild commands
- [`docs/architecture/package-layout.md`](docs/architecture/package-layout.md) for the contributor-facing package layout
