Metadata-Version: 2.4
Name: skein-glm
Version: 0.5.1
Classifier: Development Status :: 3 - Alpha
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Rust
Classifier: Topic :: Scientific/Engineering :: Mathematics
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Dist: numpy>=1.24
Requires-Dist: scipy>=1.10
Requires-Dist: scikit-learn>=1.3
Requires-Dist: pytest>=7 ; extra == 'dev'
Requires-Dist: pytest-cov ; extra == 'dev'
Requires-Dist: maturin>=1.5 ; extra == 'dev'
Requires-Dist: ruff ; extra == 'dev'
Requires-Dist: mypy ; extra == 'dev'
Requires-Dist: sphinx>=7.4 ; extra == 'docs'
Requires-Dist: furo>=2024.5 ; extra == 'docs'
Requires-Dist: myst-parser>=4 ; extra == 'docs'
Requires-Dist: sphinx-copybutton>=0.5 ; extra == 'docs'
Requires-Dist: sphinx-design>=0.6 ; extra == 'docs'
Provides-Extra: dev
Provides-Extra: docs
License-File: LICENSE
Summary: Weighted structured nonconvex sparse models (Python + Rust)
Keywords: sparse,nonconvex,mcp,scad,group-lasso,regression
Author-email: David Villacis <david@villacis.net>
License: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown; charset=UTF-8; variant=GFM
Project-URL: Repository, https://github.com/dvillacis/skein

# skein

Weighted structured nonconvex sparse models. Rust core + Python API.

> **Documentation:** the [docs site](docs/index.md) has the full
> conceptual reference (penalties, datafits, weights, backends),
> porting guides for `glmnet` / `ncvreg` / `grpreg`, worked examples,
> and an auto-generated API reference. Hosted on Read the Docs once
> the project is connected (config in `.readthedocs.yaml`); preview
> locally with `mkdocs serve`. CI builds it `--strict` on every PR.

`skein` targets a niche that's well-served in R (`grpreg`, `ncvreg`) but
missing in Python at production quality: nonconvex group-structured
penalties (group MCP, group SCAD, sparse-group nonconvex) with first-class
support for *weights along three axes* — per-sample, per-feature, and
per-group.

## Status

v0.1 development. Core algorithms and the headline GLM family are in
place; design-matrix backends (sparse, mmap, chunked) are next. See
[ROADMAP.md](ROADMAP.md) for the full plan.

**Done so far:**

- **Solvers** — production CD core (path solver, strong rule + KKT
  verification, gap-safe screening, Anderson acceleration); group block-CD
  with LLA outer loop for nonconvex group penalties; Rayon-parallel
  group sweeps; operator-norm Lipschitz via power iteration.
- **Datafits** — least squares, binomial logistic, Poisson (log link),
  Cox PH (Breslow ties). All glued together by a `GlmDatafit` trait that
  exposes a weighted-LS surrogate; the M1/M2 inner solvers absorb every
  GLM unchanged.
- **Penalties** — MCP, SCAD, group lasso, group MCP, sparse-group lasso,
  sparse-group MCP. Per-feature and per-group weights honored
  throughout.
- **Python** — sklearn-compatible estimators for every (datafit ×
  penalty) combination; type stubs; warm-started λ-paths; standardization
  with original-scale `coef_` / `intercept_` recovery (dense backend).

**M8 (Distribution & DX) is done:** CI + cibuildwheel + Read the Docs +
25-page mkdocs site (concepts + R-porting + extending + examples + API
ref) + R numerical regression suite vs glmnet/ncvreg/grpreg + stable
Rust API contract. The library is `pip install`-able once published,
documented end-to-end, and pinned against R reference fits so we don't
silently drift.

**Coming next:** algorithmic features — M5.x adaptive weights and
stability selection are the next high-value milestones; both leverage
the existing per-feature/per-group weight axes that are already wired
through every solver.

## Layout

```
crates/skein-core/   pure Rust: traits + algorithms (no Python)
crates/skein-py/     PyO3 bindings (cdylib → skein_glm._core)
python/skein/        sklearn-compatible estimators + ABCs for extensions
tests/               pytest smoke tests
benches/             criterion (Rust) + asv (Python)
```

The Rust traits (`DesignMatrix`, `Datafit`, `GlmDatafit`, `Penalty`,
`GroupPenalty`) and their Python ABC mirrors (`skein.penalties.Penalty`,
etc.) are the extension surface for downstream per-paper projects.

## Quick start

```python
import numpy as np
from skein import MCPPathRegressor, LogisticGroupMCPPathRegressor, CoxMCPRegressor

# Nonconvex sparse least squares with a λ-path.
rng = np.random.default_rng(0)
n, p = 200, 50
X = rng.standard_normal((n, p))
y = X[:, :3] @ np.array([1.5, -2.0, 0.8]) + 0.1 * rng.standard_normal(n)
model = MCPPathRegressor(gamma=3.0, n_lambdas=50, standardize=True).fit(X, y)
print(model.coefs_[-1, :5], model.intercepts_[-1])

# Logistic + group MCP via LLA, with sklearn-style predict/predict_proba.
groups = np.repeat(np.arange(p // 5), 5)  # 5 features per group
y_bin = (X[:, :3].sum(axis=1) > 0).astype(float)
clf = LogisticGroupMCPPathRegressor(groups=groups, gamma=3.0, n_lambdas=20).fit(X, y_bin)
proba = clf.predict_proba(X)  # shape (n, n_lambdas)

# Cox PH with right-censored survival data.
time = rng.exponential(1.0 / np.exp(X[:, :3].sum(axis=1)))
event = rng.uniform(size=n) < 0.7
cox = CoxMCPRegressor(lambda_=0.01, gamma=3.0).fit(X, time, event.astype(float))
risk = cox.predict(X)  # prognostic index η
```

Every regressor follows the same `(datafit) × (penalty)` × `({,Path}Regressor)`
naming scheme. The path variants warm-start across λ; their `coefs_` /
`intercepts_` (where applicable) are 2D arrays indexed by λ.

## Build

```bash
# Rust core only (fast iteration on algorithms)
cargo test -p skein-core

# Full Python package (requires maturin in your env)
maturin develop --release
pytest
```

## License

MIT.

