Metadata-Version: 2.4
Name: phospy
Version: 1.5.1
Summary: Python native implementation of selected PhosR style workflows for phosphoproteomics.
Author: PhosPy Contributors
Maintainer: falconsmilie (Shane)
License-Expression: GPL-3.0
Project-URL: Homepage, https://phospy.com
Project-URL: Repository, https://github.com/falconsmilie/phospy
Project-URL: Documentation, https://phospy.com/docs/
Project-URL: Issues, https://github.com/falconsmilie/phospy/issues
Project-URL: Changelog, https://github.com/falconsmilie/phospy/blob/main/CHANGELOG.md
Project-URL: Citation, https://github.com/falconsmilie/phospy/blob/main/CITATION.cff
Project-URL: Release Notes, https://phospy.com/docs/release-notes-1.5.0/
Keywords: phosphoproteomics,proteomics,bioinformatics,phosphorylation,phosphosite,kinase,ksea,mass-spectrometry,phosr
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: NOTICE.md
Requires-Dist: numpy>=1.24
Requires-Dist: pandas>=2.0
Requires-Dist: scipy>=1.10
Requires-Dist: scikit-learn>=1.6
Requires-Dist: tomli>=2.0; python_version < "3.11"
Provides-Extra: parquet
Requires-Dist: pyarrow>=15.0; extra == "parquet"
Provides-Extra: test
Requires-Dist: pytest>=8.0; extra == "test"
Provides-Extra: dev
Requires-Dist: pre-commit>=4.0; extra == "dev"
Requires-Dist: pyright>=1.1.409; extra == "dev"
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: ruff>=0.15.7; extra == "dev"
Dynamic: license-file

# PhosPy

PhosPy is a Python package for selected phosphoproteomics workflows inspired by
PhosR. It is aimed at scientists who want a clear Python lane from phosphosite
intensity tables to kinase scoring, kinase prediction, and optional signalome
analysis.

PhosPy does **not** provide HTTP endpoints or a web service. The supported user
interfaces are:

- the Python API
- the `phospy` command-line interface

## Install

PhosPy requires Python 3.10 or newer.

```bash
pip install phospy
```

For `.parquet` input or output support:

```bash
pip install "phospy[parquet]"
```

For local development from a clone:

```bash
pip install -e ".[dev]"
pyright
pytest -m "not parity"
```

For reproducible scientific/regression runs aligned to CI:

```bash
pip install -c constraints/ci.txt -e ".[dev,test]"
pytest tests/parity -m parity -s
```

## Beginner Lane

The smallest supported lane is:

1. build a dataset with `organism=Organism.RAT`
2. run kinase with `references=ReferencePreset.AUTO`
3. run signalome only when `site_metadata.protein_id` is present

Bundled runtime references in `1.5.0` are rat-only. For human or mouse work,
create and pass an explicit `ReferenceBundle` in Python instead of using
`ReferencePreset.AUTO`.

## Minimum Input Shape

`phospho` is a numeric site-by-sample table. Its index should use standard
PhosPy site IDs such as `TSC2;S939;`.

`site_metadata` must align to `phospho.index` and include:

- `gene_symbol`
- `site`
- `site_sequence` when you want motif-based scoring
- `protein_id` when you want signalome analysis

## Minimal Python Example

```python
import pandas as pd

from phospy import AnalysisReadyDatasetBuilder, KinaseWorkflow, SignalomeWorkflow
from phospy.api import (
    DatasetBuildRequest,
    KinaseWorkflowRequest,
    Organism,
    ReferencePreset,
    SignalomeWorkflowRequest,
)

phospho = pd.DataFrame(
    {
        "sample_a": [1.00, 0.70],
        "sample_b": [1.10, 0.80],
        "sample_c": [0.95, 0.75],
    },
    index=["TSC2;S939;", "GSK3B;S9;"],
)
site_metadata = pd.DataFrame(
    {
        "gene_symbol": ["TSC2", "GSK3B"],
        "site": ["S939", "S9"],
        "site_sequence": [
            "FDDTPEKDSFRARSTSLNERPKSLRIARAPK",
            "_______MSGRPRTTSFAESCKPVQQPSAFG",
        ],
        "protein_id": ["TSC2", "GSK3B"],
    },
    index=phospho.index.copy(),
)

dataset = AnalysisReadyDatasetBuilder().run(
    DatasetBuildRequest(
        phospho=phospho,
        site_metadata=site_metadata,
        organism=Organism.RAT,
    )
)

kinase_result = KinaseWorkflow().run(
    KinaseWorkflowRequest(
        dataset=dataset,
        references=ReferencePreset.AUTO,
        activity_config=None,  # keep this tiny two-site example in the safe lane
    )
)

signalome_result = SignalomeWorkflow().run(
    SignalomeWorkflowRequest(kinase_result=kinase_result)
)

print(dataset.phospho.shape)
print(kinase_result.prediction_result.pred_mat.round(4))
print(signalome_result.signalome_modules.table)
```

With the tables above you should get a strict `AnalysisReadyPhosphoDataset`, a
non-empty kinase prediction matrix, and signalome tables because every site has
an explicit `protein_id`.

## Minimal CLI Example

```bash
phospy kinase \
  --phospho ./input/phospho.csv \
  --site-metadata ./input/site_metadata.csv \
  --organism rat \
  --reference auto \
  --skip-activity \
  --outdir ./out
```

The CLI writes a dataset directory, a kinase directory, and a short list of file
paths written.

## Import Contract

Use top-level `phospy` for the four main entrypoints only:

```python
from phospy import AnalysisReadyDatasetBuilder, AnalysisReadyPhosphoDataset
from phospy import KinaseWorkflow, SignalomeWorkflow
```

Use `phospy.api` for requests, configs, results, enums, references, and public
exceptions.

## Documentation

1. [Quickstart](https://phospy.com/docs/quickstart/)
2. [API Guide](https://phospy.com/docs/api/)
3. [CLI Guide](https://phospy.com/docs/cli/)
4. [Validation Guide](https://phospy.com/docs/validation/)

## Citation

If you use PhosPy in scientific work, cite this software release using
[`CITATION.cff`](CITATION.cff) and also cite the upstream PhosR project and
publications described in [`NOTICE.md`](NOTICE.md).
