Metadata-Version: 2.1
Name: quantile-regression-pdlp
Version: 0.4.2
Summary: Non-crossing quantile regression toolkit with joint multi-quantile fitting, inference, conformal calibration, and evaluation. Scikit-learn compatible.
Author: Joshua Vernazza
License: MIT
Project-URL: Repository, https://github.com/joshvern/quantile_regression_pdlp
Project-URL: Issues, https://github.com/joshvern/quantile_regression_pdlp/issues
Project-URL: Documentation, https://joshvern.github.io/quantile_regression_pdlp/
Keywords: quantile-regression,non-crossing,prediction-intervals,conformal,calibration,statistics,machine-learning,scikit-learn
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3 :: Only
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: ortools>=9.0.9047
Requires-Dist: numpy>=1.18.0
Requires-Dist: pandas>=1.0.0
Requires-Dist: scipy>=1.4.0
Requires-Dist: tqdm>=4.50.0
Requires-Dist: joblib>=1.0.0
Requires-Dist: scikit-learn>=0.22.0
Provides-Extra: test
Requires-Dist: pytest>=7; extra == "test"
Provides-Extra: formula
Requires-Dist: patsy>=0.5.0; extra == "formula"
Provides-Extra: plot
Requires-Dist: matplotlib>=3.1.0; extra == "plot"
Provides-Extra: benchmark
Requires-Dist: statsmodels>=0.13.0; extra == "benchmark"
Requires-Dist: matplotlib>=3.1.0; extra == "benchmark"
Provides-Extra: all
Requires-Dist: patsy>=0.5.0; extra == "all"
Requires-Dist: matplotlib>=3.1.0; extra == "all"

[![PyPI][pypi-badge]][pypi-link]
[![Python Versions][py-badge]][pypi-link]
[![CI][ci-badge]][ci-link]
[![Docs][docs-badge]][docs-link]

[pypi-badge]: https://img.shields.io/pypi/v/quantile-regression-pdlp.svg
[py-badge]: https://img.shields.io/pypi/pyversions/quantile-regression-pdlp.svg
[ci-badge]: https://github.com/joshvern/quantile_regression_pdlp/actions/workflows/ci.yml/badge.svg
[docs-badge]: https://github.com/joshvern/quantile_regression_pdlp/actions/workflows/docs.yml/badge.svg

[pypi-link]: https://pypi.org/project/quantile-regression-pdlp/
[ci-link]: https://github.com/joshvern/quantile_regression_pdlp/actions/workflows/ci.yml
[docs-link]: https://joshvern.github.io/quantile_regression_pdlp/

# quantile-regression-pdlp

**Non-crossing quantile models with built-in inference, calibration, and evaluation.**

A quantile modeling toolkit — not just a quantile regressor. Fits multiple quantiles jointly with monotonicity constraints that guarantee predictions never cross. Wraps the result in inference, conformal calibration, evaluation metrics, and crossing diagnostics.

Scikit-learn compatible. Validated against sklearn, statsmodels, and R's `quantreg`.

## Why Not Just Fit Quantiles Independently?

When you fit quantiles one at a time (as sklearn and statsmodels do), nothing prevents the 90th percentile prediction from falling *below* the 10th. On real-world data with heavy tails, noise, or many quantile levels, **this happens frequently**:

| n | features | quantiles | Crossing rate (independent) | Crossing rate (this package) |
|---:|---:|---:|---:|---:|
| 500 | 10 | 13 | **30.0%** | **0%** |
| 1,000 | 10 | 13 | **16.5%** | **0%** |
| 2,000 | 20 | 13 | **11.0%** | **0%** |
| 2,000 | 20 | 7 | **4.5%** | **0%** |

This package eliminates crossings by construction. The joint formulation also acts as beneficial regularization — achieving **equal or better pinball loss** than independent fitting.

Full benchmark methodology and results: [Benchmarks](https://joshvern.github.io/quantile_regression_pdlp/benchmarks/)

## What You Get

This is a **toolkit**, not a single estimator. It covers the workflow from raw quantile regression through calibrated prediction intervals:

| Workflow | What it does |
|----------|-------------|
| **Joint Quantile Regression** | Fit multiple quantiles in one call with non-crossing guarantees |
| **Conformalized Quantile Regression** | Calibrate intervals for finite-sample coverage guarantees |
| **Censored Quantile Regression** | Handle right- or left-censored (survival) data |
| **Evaluation & Metrics** | Pinball loss, coverage, interval score, crossing diagnostics |
| **Calibration Diagnostics** | Coverage by group/bin, nominal vs empirical, sharpness analysis |
| **Crossing Detection & Repair** | Diagnose and fix crossings from any quantile model |

### Feature Comparison

| Feature | This package | sklearn | statsmodels |
|---------|:---:|:---:|:---:|
| Multiple quantiles (joint fit) | Yes | No | No |
| Non-crossing guarantee | Yes | No | No |
| Multi-output regression | Yes | No | No |
| Analytical / kernel / cluster / bootstrap SEs | Yes | No | Partial |
| L1 / Elastic Net / SCAD / MCP | Yes | L1 only | No |
| Conformal calibration (CQR) | Yes | No | No |
| Calibration diagnostics | Yes | No | No |
| Evaluation metrics suite | Yes | Partial | No |
| Crossing detection + fix | Yes | No | No |
| Censored QR | Yes | No | No |
| Prediction intervals | Yes | No | No |
| Pseudo R² | Yes | No | Yes |
| Formula interface | Yes | No | Yes |
| Sklearn pipeline compatible | Yes | Yes | No |

## Installation

```bash
pip install quantile-regression-pdlp
```

Optional extras:

```bash
pip install quantile-regression-pdlp[all]   # formula interface + plots
pip install quantile-regression-pdlp[plot]   # matplotlib only
pip install quantile-regression-pdlp[formula] # patsy only
```

## Quick Start

```python
import numpy as np
from quantile_regression_pdlp import QuantileRegression

X = np.random.default_rng(0).normal(size=(200, 3))
y = X @ [2.0, -1.5, 0.8] + np.random.default_rng(1).normal(scale=0.5, size=200)

# Fit 3 quantiles jointly — guaranteed non-crossing
model = QuantileRegression(tau=[0.1, 0.5, 0.9], se_method='analytical')
model.fit(X, y)

# Summaries with coefficients, SEs, p-values, and 95% CIs
print(model.summary()[0.5]['y'])

# Prediction intervals (guaranteed monotone: lower < median < upper)
interval = model.predict_interval(X[:5], coverage=0.80)
print(interval['y']['lower'], interval['y']['upper'])
```

### Conformal Calibration

Turn raw quantile predictions into intervals with coverage guarantees:

```python
from quantile_regression_pdlp.conformal import ConformalQuantileRegression

base = QuantileRegression(tau=[0.05, 0.5, 0.95], se_method='analytical')
cqr = ConformalQuantileRegression(base_estimator=base, coverage=0.90)
cqr.fit(X_train, y_train)

intervals = cqr.predict_interval(X_test)
print(cqr.empirical_coverage(X_test, y_test))  # should be >= 0.90
```

### Censored Quantile Regression

For survival data with right- or left-censoring:

```python
from quantile_regression_pdlp import CensoredQuantileRegression

model = CensoredQuantileRegression(tau=0.5, censoring='right', se_method='analytical')
model.fit(X, observed_time, event_indicator=delta)
```

### Evaluate Any Quantile Model

The metrics and diagnostics modules work with predictions from any source — not just this package:

```python
from quantile_regression_pdlp.metrics import quantile_evaluation_report
from quantile_regression_pdlp.postprocess import crossing_summary

# Evaluate predictions from XGBoost, LightGBM, or any other model
report = quantile_evaluation_report(y_true, predictions, taus)
crossings = crossing_summary(predictions, taus)
```

### Regularization

```python
QuantileRegression(tau=0.5, regularization='l1', alpha=0.1)       # Lasso
QuantileRegression(tau=0.5, regularization='elasticnet', alpha=0.1, l1_ratio=0.5)
QuantileRegression(tau=0.5, regularization='scad', alpha=0.3)     # Less bias on large coefficients
QuantileRegression(tau=0.5, regularization='mcp', alpha=0.3)
```

### Inference Options

```python
QuantileRegression(tau=0.5, se_method='analytical')   # Fast asymptotic SEs
QuantileRegression(tau=0.5, se_method='kernel')        # Heteroscedasticity-robust
QuantileRegression(tau=0.5, se_method='bootstrap', n_bootstrap=500)
# Cluster-robust SEs
model.fit(X, y, clusters=group_labels)
```

## Benchmarks

Tested on heavy-tailed heteroscedastic data (Student-t noise, 10-20 features, up to 13 quantiles):

| n | features | quantiles | Crossing (this) | Crossing (sklearn) | Pinball (this) | Pinball (sklearn) |
|---:|---:|---:|---:|---:|---:|---:|
| 500 | 10 | 7 | **0%** | 11.0% | **0.5148** | 0.5166 |
| 500 | 10 | 13 | **0%** | 30.0% | **0.5095** | 0.5240 |
| 1,000 | 10 | 13 | **0%** | 16.5% | **0.5048** | 0.5071 |
| 2,000 | 20 | 13 | **0%** | 11.0% | **0.5599** | 0.5611 |

The joint formulation also achieves slightly better pinball loss — the non-crossing constraints act as beneficial regularization.

**Speed tradeoff:** This package solves a single joint LP with non-crossing constraints, which is slower than fitting each quantile independently. The value is in the guarantee and the richer downstream workflows. For single-quantile fits where speed matters most, sklearn or statsmodels may be more appropriate.

Full results: [Benchmarks](https://joshvern.github.io/quantile_regression_pdlp/benchmarks/) | [Reproduce locally](https://joshvern.github.io/quantile_regression_pdlp/benchmarks/#reproducing-these-results)

## When to Use This Package

**Use this when you need:**
- Multiple quantile predictions that must not cross (production pipelines, interval forecasts)
- Statistical inference on quantile coefficients (SEs, p-values, confidence intervals)
- Calibrated prediction intervals (conformal quantile regression)
- Censored/survival quantile models
- A complete evaluation workflow for any quantile model's predictions

**Use sklearn or statsmodels when:**
- You only need a single quantile (e.g., median regression)
- Raw speed matters more than crossing guarantees
- You don't need inference, calibration, or evaluation tooling

## Documentation

Full docs: [joshvern.github.io/quantile_regression_pdlp](https://joshvern.github.io/quantile_regression_pdlp/)

## Implementation

Quantile regression is naturally a linear program. This package solves joint multi-quantile LPs with non-crossing constraints using:

- **PDLP** — first-order primal-dual solver (default, from Google OR-Tools)
- **GLOP** — revised simplex (faster on small/medium problems)
- **HiGHS** — via scipy's sparse LP interface (memory-efficient)

```python
QuantileRegression(tau=0.5, solver_backend='GLOP')   # simplex
QuantileRegression(tau=0.5, use_sparse=True)          # scipy sparse
```

## Dependencies

**Required:** numpy, pandas, scipy, scikit-learn, ortools, tqdm, joblib

**Optional:** matplotlib (plots), patsy (formulas), statsmodels (benchmarks)

## Contributing

Contributions welcome! Open an issue or submit a pull request on [GitHub](https://github.com/joshvern/quantile_regression_pdlp).

## License

MIT
