Metadata-Version: 2.4
Name: phospy
Version: 1.5.2
Summary: Python native implementation of selected PhosR style workflows for phosphoproteomics.
Author: PhosPy Contributors
Maintainer: falconsmilie (Shane)
License-Expression: GPL-3.0
Project-URL: Homepage, https://phospy.com
Project-URL: Repository, https://github.com/falconsmilie/phospy
Project-URL: Documentation, https://phospy.com/docs/
Project-URL: Issues, https://github.com/falconsmilie/phospy/issues
Project-URL: Changelog, https://github.com/falconsmilie/phospy/blob/main/CHANGELOG.md
Project-URL: Citation, https://github.com/falconsmilie/phospy/blob/main/CITATION.cff
Project-URL: Release Notes, https://phospy.com/docs/release-notes/
Keywords: phosphoproteomics,proteomics,bioinformatics,phosphorylation,phosphosite,kinase,ksea,mass-spectrometry,phosr
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE.md
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: scipy>=1.10
Requires-Dist: scikit-learn>=1.6
Requires-Dist: tomli>=2.0; python_version < "3.11"
Provides-Extra: parquet
Requires-Dist: pyarrow>=15.0; extra == "parquet"
Provides-Extra: test
Requires-Dist: hypothesis>=6; extra == "test"
Requires-Dist: pytest>=8.0; extra == "test"
Provides-Extra: dev
Requires-Dist: hypothesis>=6; extra == "dev"
Requires-Dist: pandas-stubs<3.0.0,>=2.2.3.250527; python_version < "3.11" and extra == "dev"
Requires-Dist: pandas-stubs>=3.0.0.260204; python_version >= "3.11" and extra == "dev"
Requires-Dist: scipy-stubs>=1.17.1.0; extra == "dev"
Requires-Dist: pre-commit>=4.0; extra == "dev"
Requires-Dist: pyright>=1.1.409; extra == "dev"
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: ruff>=0.15.7; extra == "dev"
Dynamic: license-file

# PhosPy

PhosPy is a Python package for selected phosphoproteomics workflows inspired by
PhosR. It is aimed at scientists who want a clear Python lane from phosphosite
intensity tables to differential phosphorylation analysis, kinase scoring,
kinase prediction, and optional signalome analysis.

"PhosR-inspired" in PhosPy docs means scoped, feature-level comparison lanes. It
does not imply full PhosR package parity or full PhosR API compatibility.

PhosPy does **not** provide HTTP endpoints or a web service. The supported user
interface is the Python API.

## Recommended Reading
You can view the full documentation here: [PhosPy Docs](https://phospy.com/docs)

## Install

PhosPy requires Python 3.10 or newer.

```bash
pip install phospy
```

For `.parquet` input or output support:

```bash
pip install "phospy[parquet]"
```

For local development from a clone:

```bash
pip install -e ".[dev]"
python scripts/run_pyright.py
pytest -m "not parity"
```

For reproducible scientific/regression runs aligned to CI:

```bash
pip install -c constraints/ci.txt -e ".[dev,test]"
pytest tests/parity -m parity -s
```

For full release-gate validation (unit/integration, reproducibility goldens,
parity, and performance):

```bash
pip install -c constraints/ci.txt -e ".[dev,test,parquet]"
make test-release-gate
```

## Quick Start

1. Build an analysis-ready phosphoproteomics dataset.
2. Run a kinase workflow.
3. Explore full API workflow documentation:
   - [Dataset building](docs/api/dataset-build-workflow.md)
   - [Differential workflow](docs/api/differential-workflow.md)
   - [Kinase workflow](docs/api/kinase-workflow.md)
   - [Signalome workflow](docs/api/signalome-workflow.md)

Bundled runtime references in the current release are rat-only. For human or
mouse work, create and pass an explicit `ReferenceBundle` in Python instead of
using `ReferencePreset.AUTO`.

Scientific scope categories and parity/open-gap status are maintained in
[`docs/scientific-coverage.md`](docs/scientific-coverage.md). Parity fixture
evidence lives in [`docs/parity.md`](docs/parity.md).

## Kinase Workflow Example

```python
import pandas as pd

from phospy import AnalysisReadyDatasetBuilder, KinaseWorkflow
from phospy.api import (
    DatasetBuildRequest,
    IntensityScaleKind,
    DatasetLocalisationConfig,
    DatasetPreprocessingConfig,
    KinaseWorkflowRequest,
    Organism,
    ReferencePreset,
)

# Tiny synthetic example for workflow mechanics only (not biological discovery).
phospho = pd.DataFrame(
    {
        "control_rep1": [8200.0, 9100.0, 6000.0],
        "control_rep2": [8000.0, 9000.0, 5900.0],
        "treatment_rep1": [16200.0, 9150.0, 13000.0],
        "treatment_rep2": [15800.0, 9050.0, 12800.0],
    },
    index=["MAPK14;Y182;", "GSK3B;S9;", "TSC2;S939;"],
)
site_metadata = pd.DataFrame(
    {
        "gene_symbol": ["MAPK14", "GSK3B", "TSC2"],
        "site": ["Y182", "S9", "S939"],
        "site_sequence": [
            "AAAAAAAAAAAAAAAYAAAAAAAAAAAAAAA",
            "AAAAAAAAAAAAAAASAAAAAAAAAAAAAAA",
            "AAAAAAAAAAAAAAASAAAAAAAAAAAAAAA",
        ],
        "protein_id": ["MAPK14", "GSK3B", "TSC2"],
        "localisation_confidence": [0.95, 0.94, 0.96],
    },
    index=phospho.index.copy(),
)
sample_metadata = pd.DataFrame(
    {
        "comparison_group": ["control", "control", "treatment", "treatment"],
    },
    index=phospho.columns.copy(),
)

dataset = AnalysisReadyDatasetBuilder().run(
    DatasetBuildRequest(
        phospho=phospho,
        site_metadata=site_metadata,
        sample_metadata=sample_metadata,
        organism=Organism.RAT,
        input_intensity_scale=IntensityScaleKind.LINEAR,
        preprocessing_config=DatasetPreprocessingConfig(
            # Site-level workflows should fail fast when localisation is missing
            # or below threshold, because ambiguous site assignment can
            # mis-state kinase/substrate interpretation.
            localisation=DatasetLocalisationConfig(
                mode="require_threshold",
                confidence_column="localisation_confidence",
                min_confidence=0.75,
            )
        ),
    )
)

# Dataset construction validates required site metadata, including site_sequence.
print(dataset.site_metadata.loc[:, ["gene_symbol", "site", "site_sequence"]])
# sample_metadata is descriptive/alignment metadata on the dataset.
# Differential workflow design is provided separately via ExperimentalDesign.

kinase_result = KinaseWorkflow().run(
    KinaseWorkflowRequest(
        dataset=dataset,
        references=ReferencePreset.AUTO,
        activity_config=None,
    )
)

print(kinase_result.prediction_result.pred_mat.round(3).iloc[:3, :5])
if kinase_result.prediction_result.substrate_list is not None:
    print(kinase_result.prediction_result.substrate_list.head(5))
```

## Import Contract

Use top-level `phospy` for the five main entrypoints:

```python
from phospy import AnalysisReadyDatasetBuilder, AnalysisReadyPhosphoDataset
from phospy import DifferentialAnalysisWorkflow, KinaseWorkflow, SignalomeWorkflow
```

Use `phospy.api` for requests, configs, results, enums, references, and public
exceptions.

## Documentation

1. [Quickstart](https://phospy.com/docs/quickstart/)
2. [API Guide](https://phospy.com/docs/api/)
3. [Workflow Contracts](https://phospy.com/docs/workflow_contracts/)
4. [Validation Guide](https://phospy.com/docs/validation/)
5. [Scientific Coverage Matrix](https://phospy.com/docs/scientific-coverage/)

## Citation

If you use PhosPy in scientific work, cite this software release using
[`CITATION.cff`](CITATION.cff) and also cite the upstream PhosR project and
publications described in [`NOTICE.md`](NOTICE.md).
