Metadata-Version: 2.4
Name: pystatial
Version: 0.1.1
Summary: A pure-Python re-implementation of Bioconductor Statial for spatial cell state analysis
Author: RebuildR Agent
License: GPL-3.0
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.22
Requires-Dist: pandas>=1.4
Requires-Dist: scipy>=1.8
Requires-Dist: scikit-learn>=1.0
Requires-Dist: anndata>=0.8
Requires-Dist: scanpy>=1.9
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-xdist; extra == "dev"
Dynamic: license-file

# py-statial

A pure-Python re-implementation of Bioconductor [Statial](https://github.com/SydneyBioX/Statial) for spatial cell state analysis.

## Install

```bash
pip install pystatial
```

## Quickstart

```python
import statial
import pandas as pd

# Load spatial data
cells = pd.read_csv("cell_metadata.csv")

# Calculate distances between cell types
import anndata as ad
adata = ad.AnnData(obs=cells)
adata = statial.get_distances(adata, max_dist=200)

# Calculate abundances
adata = statial.get_abundances(adata, r=200)

# Run Kontextual analysis
result = statial.Kontextual(
    cells=cells,
    r=50,
    from_types="Macrophages",
    to_types="Keratin_Tumour",
    parent=["Macrophages", "CD4_Cell"],
    image=["6"],
)
```

## Function mapping (Python ⇄ R)

| Python function | R function | Description |
|---|---|---|
| `get_distances()` | `getDistances()` | Pairwise cell type distances |
| `get_abundances()` | `getAbundances()` | Cell type abundances (K-function) |
| `calc_contamination()` | `calcContamination()` | RF-based contamination scores |
| `Kontextual()` | `Kontextual()` | Conditional spatial relationships |
| `kontext_curve()` | `kontextCurve()` | Kontextual over radii range |
| `kontext_plot()` | `kontextPlot()` | Plot Kontextual results |
| `calc_state_changes()` | `calcStateChanges()` | Linear model state changes |
| `make_window()` | `makeWindow()` | Create observation windows |
| `parent_combinations()` | `parentCombinations()` | Parent-child combinations |
| `get_parent_phylo()` | `getParentPhylo()` | Extract phylo tree structure |
| `prep_matrix()` | `prepMatrix()` | Convert results to matrix |
| `get_marker_means()` | `getMarkerMeans()` | Average marker expression |
| `relabel()` | `relabel()` | Permute cell labels |
| `relabel_kontextual()` | `relabelKontextual()` | Permutation testing |
| `is_kontextual()` | `isKontextual()` | Validate kontextual result |

## Benchmark

Dataset: Keren et al. 2018 MIBI-TOF breast cancer (`data("kerenSCE")`, patient 6) — 57,811 cells, 10 images, 17 cell types. Exported via `colData()` from R's SingleCellExperiment to CSV.

Environment: Python 3.9.13 + scipy 1.8 + sklearn 1.0 vs R 4.5.2 + Statial 1.11.6. All timings are 3-run means.

### Parity — vs R reference

| Function | Metric | Value | Threshold | Pass |
|---|---|---|---|---|
| `get_distances` | max abs error | 1.33e-11 | 1e-8 | ✅ |
| `get_abundances` | max abs error | 0.0 | 1e-8 | ✅ |
| `Kontextual` (L-function) | max abs error | 7.11e-15 | 1e-8 | ✅ |
| `Kontextual` (value) | relative error | 5.14% | 10% | ✅ |

Core numerical functions match R at machine precision. The ~5% difference in Kontextual values comes from cKDTree vs spatstat closepairs spatial indexing.

### Speed — Python vs R

| Function | Python (s) | R (s) | Speedup |
|---|---|---|---|
| `get_distances` | 0.84 | 7.48 | **8.9×** |
| `get_abundances` | 5.47 | 7.25 | **1.3×** |
| `Kontextual` (img6) | 0.09 | 0.20 | **2.2×** |
| **Total** | **6.39** | **14.93** | **2.3×** |

`get_distances` benefits most from cKDTree spatial indexing (8.9x). `get_abundances` is comparable (bottleneck is per-cell counting loop).

## Citation

If you use Statial in your work, please cite:

> Ameen, F., Iyengar, S., Qin, A., Ghazanfar, S., & Patrick, E. (2022). Statial: A package to identify changes in cell state relative to spatial associations.

## License

GPL-3.0 (matching upstream R package)
