Metadata-Version: 2.4
Name: pycmprsk
Version: 1.0.0
Summary: Python port of R's cmprsk: Estimation, testing and regression modeling of subdistribution functions in competing risks.
Project-URL: Homepage, https://github.com/covertcast/pycmprsk
Project-URL: Repository, https://github.com/covertcast/pycmprsk
Project-URL: Issues, https://github.com/covertcast/pycmprsk/issues
Project-URL: Changelog, https://github.com/covertcast/pycmprsk/releases
Author: covertcast
License-Expression: GPL-3.0-or-later
License-File: LICENSE
Keywords: biostatistics,cmprsk,competing-risks,cumulative-incidence,fine-gray,subdistribution-hazards,survival-analysis
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Healthcare Industry
Classifier: Intended Audience :: Science/Research
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Scientific/Engineering :: Bio-Informatics
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Python: >=3.13
Requires-Dist: matplotlib>=3.10.9
Requires-Dist: numba>=0.65.1
Requires-Dist: numpy>=2.4.6
Requires-Dist: scipy>=1.17.1
Provides-Extra: test
Requires-Dist: pytest>=9.0.3; extra == 'test'
Description-Content-Type: text/markdown

# pycmprsk

A Python port of the R package [`cmprsk`](https://cran.r-project.org/package=cmprsk). Estimation, testing and regression modeling of subdistribution functions in competing risks:

- **`cuminc`** - non-parametric cumulative incidence functions, with Gray's
  k-sample test across groups and stratification.
- **`crr`** - Fine-Gray subdistribution-hazards regression, with time-fixed
  covariates (`cov1`), time-varying covariates (`cov2 + tf`), per-group
  censoring weights (`cengroup`), and the full Huber/White sandwich variance
  including the `q(u)` correction for the estimated censoring distribution.
- **`predict_crr`**, **`summary_crr`**, **`timepoints`**, **`plot_cuminc`**,
  **`plot_predict`** - the same downstream API as R.

This package's functionality is numerically validated against R's `cmprsk`. See
[Parity testing](#parity-testing) below.

## Install

```bash
pip install pycmprsk
```

The package is pure Python; the hot loops are JIT-compiled with
[numba](https://numba.pydata.org/).

## Quick example

The arrays below were dumped from R using `RNGversion("1.6.2"); set.seed(2)`
and the same data setup as `cmprsk/tests/test.R`. The resulting plots
visually match R's `plot(cuminc(...))` and `plot(predict(crr(...)))`.

```python
import matplotlib.pyplot as plt
import numpy as np

from pycmprsk import crr, cuminc, plot_cuminc, plot_predict, predict_crr


def tf_quad(uft):
    """Match R's ``function(uft) cbind(uft, uft^2)``."""
    uft = np.asarray(uft, dtype=np.float64)
    return np.column_stack([uft, uft**2])
```

<details>
<summary><b>Data arrays</b> (Click to expand - Data sourced from R's test suite)</summary>

```python
ftime = np.array([
    0.686305, 0.149818, 1.611875, 1.077275, 1.553027, 0.286783, 0.234919, 0.255626, 0.536215, 1.420936,
    1.979941, 0.816767, 0.970783, 3.376077, 1.407218, 0.229477, 2.821243, 1.598966, 0.661166, 0.291716,
    2.421805, 0.264711, 0.419970, 0.994872, 5.248650, 0.493777, 0.036222, 0.039556, 2.225511, 1.896816,
    1.562481, 2.080967, 0.062462, 0.308574, 0.854363, 1.086975, 0.183905, 0.877297, 0.166353, 1.346992,
    3.303843, 0.723761, 0.043173, 1.635107, 1.022373, 1.565542, 0.734400, 1.705071, 1.527256, 1.921497,
    1.854679, 0.310276, 2.424571, 0.515172, 1.251790, 1.054940, 0.010267, 1.079949, 0.136024, 0.466943,
    1.348637, 0.113960, 2.535242, 0.762922, 0.432438, 0.666299, 0.862624, 0.479771, 0.397440, 1.493170,
    0.661091, 0.540539, 1.355944, 0.773167, 3.902563, 0.117417, 1.786273, 0.072698, 0.259388, 2.092709,
    0.229584, 0.490496, 0.425987, 0.335195, 0.697602, 0.097860, 0.917998, 0.174528, 0.680717, 1.835194,
    2.997399, 1.937913, 0.520418, 1.653625, 2.238665, 0.149357, 0.720766, 0.096726, 0.831950, 1.003850,
])

fstatus = np.array([
    1, 2, 0, 2, 2, 1, 0, 2, 2, 0, 0, 2, 2, 0, 0, 0, 0, 0, 1, 0,
    2, 1, 1, 1, 1, 2, 0, 0, 1, 0, 1, 1, 1, 2, 0, 0, 1, 2, 0, 1,
    2, 1, 2, 0, 2, 0, 0, 2, 1, 2, 1, 1, 2, 1, 0, 2, 0, 2, 1, 2,
    0, 2, 2, 1, 2, 1, 2, 2, 1, 0, 0, 0, 0, 2, 0, 2, 2, 2, 2, 0,
    0, 0, 1, 1, 2, 1, 2, 2, 1, 0, 1, 1, 1, 2, 1, 2, 2, 0, 1, 1,
])

group_code = np.array([
    3, 1, 3, 1, 2, 2, 2, 3, 2, 2, 1, 2, 2, 3, 2, 3, 1, 1, 2, 1,
    1, 3, 2, 3, 1, 1, 1, 1, 3, 1, 1, 2, 3, 2, 3, 1, 3, 2, 1, 3,
    3, 3, 3, 2, 2, 1, 1, 2, 2, 2, 3, 1, 1, 3, 2, 3, 3, 1, 1, 2,
    1, 3, 2, 1, 2, 3, 1, 1, 3, 3, 3, 1, 2, 3, 1, 1, 2, 2, 3, 3,
    2, 3, 3, 2, 2, 2, 2, 3, 2, 1, 2, 1, 1, 2, 1, 1, 1, 1, 2, 1,
])
group = np.array(["a", "b", "c"])[group_code - 1]

strata = np.array([
    1, 1, 2, 2, 2, 2, 2, 1, 1, 1, 2, 1, 2, 2, 2, 2, 1, 1, 1, 1,
    2, 1, 2, 2, 1, 2, 1, 1, 2, 2, 2, 1, 1, 1, 2, 1, 1, 2, 1, 2,
    2, 2, 2, 1, 1, 1, 1, 1, 1, 2, 1, 2, 2, 2, 1, 1, 2, 1, 2, 1,
    1, 1, 1, 2, 1, 1, 1, 2, 2, 2, 1, 2, 1, 1, 2, 1, 2, 1, 1, 1,
    1, 2, 2, 1, 1, 2, 1, 2, 2, 2, 1, 2, 1, 1, 2, 1, 2, 2, 2, 2,
])

cov1 = np.array([
    np.nan, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0,
    1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0,
    1.0, 1.0, 1.0, np.nan, 1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0,
    0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0,
    1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, np.nan, 1.0, 1.0,
    1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0,
    1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
    0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0,
    0.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0,
    1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0,
    0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0,
    1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 1.0,
    0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0,
    1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0,
    1.0, 1.0, 0.0, 1.0, 0.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0,
    0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0,
    0.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 1.0,
    1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0,
    1.0, 0.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0,
    1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 0.0,
    1.0, 0.0, 0.0, 1.0, 1.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0, 0.0,
    1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 1.0, 0.0, 1.0,
    1.0, 0.0, 0.0, 0.0, 1.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0,
    0.0, 1.0, 1.0, 1.0, 0.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0,
    0.0, 1.0, 1.0, 1.0, 0.0, 0.0, 1.0, 0.0, 1.0, 1.0, 1.0, 1.0,
]).reshape(100, 3)

cov2 = np.column_stack([cov1[:, 0], cov1[:, 0]])
cengroup = cov1[:, 2]
```

</details>

```python
ci = cuminc(ftime, fstatus, group=group, strata=strata)
print("cuminc curve keys:", list(ci.curves.keys()))
print("Gray's k-sample tests (stat, pv, df):\n", ci.tests)

fit = crr(ftime, fstatus, cov1=cov1, cov2=cov2, tf=tf_quad, cengroup=cengroup)
print("crr coefs:", fit.coef)
print("crr converged:", fit.converged)

pred = predict_crr(
    fit,
    cov1=np.array([[1.0, 1.0, 1.0], [0.0, 0.0, 0.0]]),
    cov2=np.array([[1.0, 1.0], [0.0, 0.0]]),
)

fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(12, 4))
plot_cuminc(ci, ax=ax1)
ax1.set_title("cuminc(ss, cc, gg, strt)")
plot_predict(pred, ax=ax2)
ax2.set_title("predict(crr(ss, cc, cv, cov2, tf, cengroup=cv[,3]))")
plt.tight_layout()
plt.show()
```

## What's different from R's `cmprsk`

`pycmprsk` is a port of the functionality, not a verbatim translation of the API:

- Returns dataclasses (`CRRResult`, `CumincResult`, `SummaryCRR`) rather than
  R's named lists. Field names use Python snake_case (`n_missing` vs.
  R's `n.missing`, `loglik_null` vs. `loglik.null`).
- `tf` (the time-varying covariate function) takes and returns NumPy arrays;
  Python's contract is that it returns shape `(ndf, p2)` (R's `cmprsk` wraps
  1D output via `as.matrix`; do the equivalent with `.reshape(-1, 1)`).
- `na.action` is fixed to "omit rows with any NA," matching R's default.

Behavioral parity is the explicit design goal - see `src/tests/test_parity.py`.

## Parity testing

The test suite is **1:1 with R's `cmprsk/tests/test.R`**: every scenario in
that file has a corresponding `.npz` fixture (data + R reference outputs)
under `src/tests/fixtures/`, and one matching Python test per scenario.

To regenerate the fixtures (requires R with the `cmprsk` and `reticulate`
packages installed):

```bash
Rscript src/tests/r_fixtures.R
```

To run the parity tests:

```bash
pytest src/tests
```

## License

`pycmprsk` is distributed under the **GNU General Public License v3.0 or
later** (GPL-3.0-or-later).

This package is a derivative work of R's `cmprsk` (Bob Gray), which is
licensed under GPL (>= 2). The Fortran sources from `cmprsk/src/*.f` have
been re-implemented in Python while preserving the original
algorithms. As a derivative work, `pycmprsk` must remain GPL-compatible.
