Metadata-Version: 2.4
Name: gamlss
Version: 0.1.0
Summary: Python port of the R gamlss package: Generalised Additive Models for Location Scale and Shape
Author: gamlss-python contributors
License: GPL-3.0-only
Project-URL: Homepage, https://github.com/fzhao70/gamlss-python
Project-URL: Issues, https://github.com/fzhao70/gamlss-python/issues
Keywords: gamlss,statistics,regression,distributions,GAM
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: GNU General Public License v3 (GPLv3)
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24
Requires-Dist: scipy>=1.10
Requires-Dist: pandas>=1.5
Requires-Dist: patsy>=0.5.3
Provides-Extra: test
Requires-Dist: pytest>=7; extra == "test"
Dynamic: license-file

# gamlss-python

[![PyPI](https://img.shields.io/pypi/v/gamlss)](https://pypi.org/project/gamlss/)
[![Python](https://img.shields.io/badge/python-3.10%2B-blue)](https://pypi.org/project/gamlss/)
[![License: GPL-3.0](https://img.shields.io/badge/license-GPL--3.0-blue)](LICENSE)

*R's `gamlss` in pure Python, numerically verified against the
original.*

A Python port of the R packages **gamlss** (5.5-0) and **gamlss.dist**
(6.1-1): Generalised Additive Models for Location, Scale and Shape
(Rigby & Stasinopoulos, 2005).

- [Pure Python](#pure-python)
- [Status](#status)
- [Install](#install)
- [Quick start](#quick-start)
- [API mapping (R -> Python)](#api-mapping-r---python)
- [Families implemented](#families-implemented)
- [Verification against R](#verification-against-r)
- [Known, documented differences](#known-documented-differences)
- [Releasing to PyPI](#releasing-to-pypi)
- [License and attribution](#license-and-attribution)
- [Layout](#layout)

The port reproduces the original R implementation *numerically*: the
RS/CG fitting algorithms, link functions, distribution derivatives,
deviance computations, standard errors and predictions are transcribed
from the R sources and verified against reference results generated by
the original R package (see `tests/`). Coefficients typically agree
with R to ~1e-9 relative, deviances to 1e-9, iteration counts exactly.

## Pure Python

The package is implemented entirely in Python — no C extensions, no
Cython, no build step, and no dependency on R at runtime. The only
requirements are numpy, scipy, pandas and patsy (whose own compiled
internals do the numerical heavy lifting, as with any scientific
Python package). Routines that the R version implements in C (e.g.
the PIG/SICHEL-family recursions in `gamlss.dist/src/*.c`) were
reimplemented in Python/numpy rather than wrapped.

## Status

This is a young port — treat it as **beta**. The parametric fitting
core is verified against R by the automated suite described below,
but beyond that only some of the core functions have been manually
tested; less-travelled code paths may still contain porting bugs.
If a result matters, cross-check it against the original R package,
and please report discrepancies via the issue tracker.

## Install

```bash
pip install gamlss        # from PyPI

pip install -e .          # from the repo root
pip install -e .[test]    # with test dependencies
```

Dependencies: numpy, scipy, pandas, patsy.

## Quick start

```python
import gamlss as gl
from gamlss import gamlss, fitted, coef, deviance, GAIC

ab = gl.load_data("abdom")

# R: gamlss(y ~ x, sigma.formula = ~x, family = BCT, data = abdom)
m = gamlss("y ~ x", sigma_formula="~x", family=gl.BCT(), data=ab)

m.summary()                  # R: summary(m)
coef(m, "mu")                # R: coef(m, "mu")
fitted(m, "sigma")           # R: fitted(m, "sigma")
deviance(m)                  # R: deviance(m)
GAIC(m, k=2)                 # R: GAIC(m, k=2)
m.predict(what="mu", newdata=ab.head(5), type="response")
```

## API mapping (R -> Python)

R uses `.` inside argument and component names; Python uses `_`.

| R | Python |
|---|--------|
| `gamlss(y~x, sigma.formula=~z, family=GA, data=d)` | `gamlss("y ~ x", sigma_formula="~z", family=GA(), data=d)` |
| `gamlss.control(n.cyc=50)` | `gamlss_control(n_cyc=50)` or `gamlss(..., n_cyc=50)` |
| `glim.control(cc=1e-4)` | `glim_control(cc=1e-4)` or `gamlss(..., cc=1e-4)` |
| `method=RS()` / `CG()` / `mixed(2, 20)` | `method=RS()` / `CG()` / `mixed(2, 20)` |
| `mu.start=, sigma.fix=TRUE` | `mu_start=, sigma_fix=True` |
| `weights=w` | `weights=w` (array or column name) |
| `m$mu.fv`, `m$mu.lp`, `m$mu.coefficients` | `m.mu_fv`, `m.mu_lp`, `m.mu_coefficients` |
| `m$G.deviance`, `m$aic`, `m$sbc`, `m$df.fit` | `m.G_deviance`, `m.aic`, `m.sbc`, `m.df_fit` |
| `fitted(m, "mu")` | `fitted(m, "mu")` or `m.fitted("mu")` |
| `coef(m, what="sigma")` | `coef(m, what="sigma")` |
| `coefAll(m)` | `coefAll(m)` |
| `deviance(m, "G")` | `deviance(m, "G")` |
| `logLik(m)` | `logLik(m)` |
| `AIC(m1, m2, k=2)` / `GAIC(m, k=log(n))` / `IC` | `AIC(m1, m2, k=2)` / `GAIC(m, k=...)` / `IC` |
| `summary(m, type="vcov"/"qr")` | `m.summary(type="vcov"/"qr")` or `summary(m, ...)` |
| `vcov(m, type="se")` | `vcov(m, type="se")` or `m.vcov("se")` |
| `predict(m, what="mu", newdata=nd, type="response")` | `m.predict(what="mu", newdata=nd, type="response")` |
| `predictAll(m, newdata=nd)` | `m.predictAll(newdata=nd)` |
| `lpred(m, what="mu", type="terms", se.fit=TRUE)` | `lpred(m, what="mu", type="terms", se_fit=True)` |
| `residuals(m)` (normalised quantile residuals) | `residuals(m)` / `m.get_residuals()` |
| `residuals(m, "mu", type="weighted")` | `residuals(m, "mu", type="weighted")` |
| `Rsq(m)` | `Rsq(m)` |
| `dNO/pNO/qNO/rNO(..., lower.tail=, log.p=)` | `dNO/pNO/qNO/rNO(..., lower_tail=, log_p=)` |

### Formulas

Formulas are strings interpreted by patsy with R-compatible behaviour:

- `"y ~ x + C(g)"` — factors via `C()` (R: `y ~ x + g` with a factor
  `g`); design-matrix term order follows R's `model.matrix`.
- `"y ~ poly(x, 3)"` — exact port of R's orthogonal `poly()`
  (including prediction via the stored recurrence coefficients).
- `"cbind(y, n - y) ~ x"` — binomial responses with denominators.
- `"y ~ x + offset(log(t))"` — in-formula offsets.
- `^` is translated to R semantics (`(a+b)^2` crossing, power inside
  `I()`).

### Families implemented

Continuous: `NO, NO2, EXP, GA, IG, IGAMMA, GU, RG, LO, LOGNO, WEI,
WEI3, TF, PE, GG, SN1, JSU, JSUo, SHASHo, BCCG, BCCGo, BCT, BCTo,
BCPE, BCPEo, BE, BEo`
Discrete: `PO, NBI, NBII, GEOM, PIG, LG, ZIP, ZIP2, ZINBI, ZANBI`
Binomial-type: `BI, BB, ZABI, ZIBI`

Each family ships its `d`/`p`/`q`/`r` functions (`dGA`, `pGA`, ...)
matching the R parameterisations exactly.

## Verification against R

`r-scripts/gen_reference.R` fits 43 reference models with the original
R gamlss (plus dense d/p/q grids for every family) and exports the
results to JSON. The pytest suite re-fits every model in Python and
compares:

- coefficients (rtol 1e-6, typically agree to 1e-9),
- global deviance / AIC / SBC (rtol 1e-9),
- effective df, residual df, `noObs` and the *exact* RS/CG iteration
  count,
- fitted values for every distribution parameter,
- quantile residuals (continuous families; discrete ones are
  randomised by construction),
- qr-type standard errors (`chol2inv` of the working-WLS R matrix),
- vcov-type standard errors (numerical Hessian, `optimHess` with
  R's `ndeps=1e-3`, falling back to `HessianPB` exactly as R does),
- predictions for new data (including R's "safe prediction" refit
  semantics, factor levels and `poly()` terms).

Run them:

```bash
# regenerate the reference (needs R with gamlss + jsonlite):
Rscript r-scripts/gen_reference.R
# verify the Python port:
python -m pytest tests/ -q
```

### Known, documented differences

- R quirks are reproduced where they affect results; e.g. the
  `ifelse(x<0, 0, fy)` shape-truncation in `dBI` that makes ZIBI/ZABI
  use the first observation's `bd`/`mu` in the zero-mass correction is
  replicated (`gamlss/dist/BI.py:_dBI0`).
- `qIG`/`pSN1`-style quantiles use root finding: R's `uniroot`
  (tol ~1.2e-4) vs scipy's `brentq` (1e-12) — Python values are more
  precise; agreement is at R's tolerance.
- `vcov()` on near-singular Hessians (e.g. BCTo on abdom where
  tau -> inf): R produces NaN standard errors and its `summary()`
  falls back to `type="qr"`; knife-edge sign differences in the
  fallback path can occur. The deviance/coefficients still agree.
- Randomised quantile residuals for discrete families use NumPy's RNG;
  they match R distributionally, not bitwise (pass `rng=` for
  reproducibility).
- Smoothing/additive terms (`pb()`, `cs()`, ...) are not ported yet —
  the parametric GAMLSS core (formulas, all four parameters, RS/CG,
  weights, offsets, factors) is complete.

## Releasing to PyPI

Releases are published by GitHub Actions
(`.github/workflows/publish.yml`) via PyPI [trusted
publishing](https://docs.pypi.org/trusted-publishers/) — no API tokens
needed. One-time setup: on pypi.org, add a "pending publisher" for the
project `gamlss` pointing at this repo
(`fzhao70/gamlss-python`, workflow `publish.yml`, environment `pypi`),
and create a GitHub environment named `pypi` in the repo settings.
Then to release:

1. Bump `version` in `pyproject.toml`.
2. Tag and create a GitHub release (`git tag v0.1.0 && git push --tags`,
   then "Releases → Draft a new release").
3. The workflow builds the sdist/wheel and uploads to PyPI.

## License and attribution

This package is a Python translation of the R packages
[gamlss](https://cran.r-project.org/package=gamlss) and
[gamlss.dist](https://cran.r-project.org/package=gamlss.dist) by Mikis
Stasinopoulos, Robert Rigby and co-authors (licensed `GPL-2 | GPL-3`),
and bundles datasets from
[gamlss.data](https://cran.r-project.org/package=gamlss.data). As a
derivative work it is distributed under the **GNU GPL v3** (see
`LICENSE`). It is not affiliated with or endorsed by the original
authors.

If you use GAMLSS in published work, please cite:

> Rigby, R. A. and Stasinopoulos, D. M. (2005). Generalized additive
> models for location, scale and shape (with discussion).
> *Applied Statistics*, 54, 507-554.

## Layout

```
gamlss/            the package
  engine.py        gamlss(), RS/CG/mixed, glim.fit, lm.wfit, controls
  family.py        gamlss.family infrastructure
  links.py         make.link.gamlss link functions
  formula.py       R formulas on patsy (poly, cbind, offset, naming)
  model.py         the fitted-model object + all S3-method equivalents
  methods.py       functional API (fitted, coef, GAIC, ...)
  rqres.py         normalised (randomised) quantile residuals
  dist/            one module per gamlss.dist family
  data/            datasets from gamlss.data as CSV
r-scripts/         R reference generation + fetch_r_sources.sh
tests/             verification suite (R reference JSONs included)
```

The original R sources used for the transcription are not committed;
run `r-scripts/fetch_r_sources.sh` to download the exact CRAN versions
into `r-src/`.
