Metadata-Version: 2.4
Name: sarb
Version: 0.1.0
Summary: Signal-Adaptive Residual Boosting — two-phase gradient boosting with per-tree OOB step optimization
Author-email: Anjana Yatawara <yatawara@csub.edu>
License: MIT
Project-URL: Homepage, https://github.com/anjana-yatawara/sarb
Project-URL: Bug Tracker, https://github.com/anjana-yatawara/sarb/issues
Project-URL: Documentation, https://sarb.readthedocs.io
Keywords: machine-learning,gradient-boosting,ensemble,regression,tree-based
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Science/Research
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Scientific/Engineering :: Mathematics
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.21
Requires-Dist: scipy>=1.7
Requires-Dist: scikit-learn>=1.1
Requires-Dist: pandas>=1.3
Provides-Extra: boost
Requires-Dist: xgboost>=1.6; extra == "boost"
Requires-Dist: lightgbm>=3.3; extra == "boost"
Requires-Dist: catboost>=1.0; extra == "boost"
Provides-Extra: plot
Requires-Dist: matplotlib>=3.5; extra == "plot"
Requires-Dist: seaborn>=0.11; extra == "plot"
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: mypy; extra == "dev"
Provides-Extra: all
Requires-Dist: sarb[boost,dev,plot]; extra == "all"
Dynamic: license-file

# sarb — Signal-Adaptive Residual Boosting

[![PyPI](https://img.shields.io/pypi/v/sarb.svg)](https://pypi.org/project/sarb/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python](https://img.shields.io/pypi/pyversions/sarb.svg)](https://pypi.org/project/sarb/)

**Two-phase gradient boosting** with per-tree out-of-bag step optimization
and residual-anchored feature selection. A scikit-learn compatible Python
package mirroring the R package from Yatawara (2026).

## Installation

```bash
# Basic install
pip install sarb

# With optional boosting backends
pip install sarb[boost]

# Everything
pip install sarb[all]
```

## Quick Start

```python
from sarb import SARBRegressor
from sklearn.datasets import make_friedman1
from sklearn.model_selection import train_test_split

# Generate data
X, y = make_friedman1(n_samples=500, random_state=42)
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=42)

# Fit
model = SARBRegressor(n_trees=500, random_state=42)
model.fit(X_train, y_train)

# Predict
preds = model.predict(X_test)
print(f"R²: {model.score(X_test, y_test):.3f}")

# Feature importance (lambda-weighted)
print("Importances:", model.feature_importances_)

# Anchor frequency (unique to SARB)
print("Anchor freq:", model.anchor_frequency_)
```

## sklearn-Compatible

Works seamlessly with `Pipeline`, `GridSearchCV`, `cross_val_score`:

```python
from sklearn.pipeline import Pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import GridSearchCV

pipe = Pipeline([
    ("scaler", StandardScaler()),
    ("sarb", SARBRegressor(random_state=42)),
])

grid = GridSearchCV(pipe, {
    "sarb__warmup_frac": [0.1, 0.25, 0.5],
    "sarb__n_anchors": [1, 2, 3],
}, cv=5, scoring="r2")
grid.fit(X, y)
```

## Unified Wrappers for Other Methods

Same syntax for every method — just change `method=`:

```python
from sarb import boost_trees, forest_trees

# Boosting
m1 = boost_trees(X, y, method="sarb")
m2 = boost_trees(X, y, method="gbm")
m3 = boost_trees(X, y, method="xgboost")     # needs xgboost installed
m4 = boost_trees(X, y, method="lightgbm")    # needs lightgbm
m5 = boost_trees(X, y, method="catboost")    # needs catboost
m6 = boost_trees(X, y, method="histgbm")

# Forests
m7 = forest_trees(X, y, method="rf")
m8 = forest_trees(X, y, method="extratrees")

# Same .predict() interface for all
m1.predict(X_test)
m3.predict(X_test)
```

## Benchmark Multiple Methods

One call runs CV for all methods + Wilcoxon tests:

```python
from sarb import benchmark

results = benchmark(X, y, methods=["sarb", "gbm", "xgboost", "rf"])
results.print_summary()
```

```
BENCHMARK RESULTS (5-fold CV, n=500, p=10)
==========================================================
  Method        RMSE      MAE        R²  Rank     Time
  ────────────────────────────────────────────────────
  sarb        1.9903   1.5234    0.842     1     2.1s ★
  xgboost     2.1542   1.6823    0.818     2     0.4s
  gbm         2.2529   1.7845    0.803     3     1.2s
  rf          2.6626   2.0923    0.726     4     0.8s

Statistical tests (vs sarb):
  vs xgboost    : p = 0.0234 *
  vs gbm        : p = 0.0043 **
  vs rf         : p = 0.0001 ***
```

## Reproduce the Paper

```python
from sarb import reproduce_paper, friedman1

df = friedman1(n_samples=500)
X, y = df.iloc[:, :10].values, df["y"].values

# Same settings as the paper's 907-dataset benchmark
result = reproduce_paper(X, y)
result.print_summary()
```

## Hyperparameter Tuning

```python
from sarb import tune_sarb, sensitivity

# Grid search with CV
best = tune_sarb(X, y, param_grid={
    "warmup_frac": [0.1, 0.25, 0.5],
    "n_anchors":   [1, 2, 3],
    "colsample":   [0.6, 0.8, 1.0],
})
print(best["best_params"])

# How sensitive is SARB to warmup_frac?
sens = sensitivity(X, y, param="warmup_frac")
```

## How SARB Works

**Phase 1 (Warmup)**: Standard gradient boosting with all features and
a fixed learning rate. Captures dominant main effects.

**Phase 2 (Explore)**: Each tree is fit on a feature subset anchored by
the predictors most correlated with current residuals. Per-tree step
size is determined by out-of-bag line search. Uninformative trees
receive step size zero and are rejected — typically 40-60% of Phase 2
trees are rejected.

## Citation

```
Yatawara, A. (2026). Signal-Adaptive Residual Boosting for Regression.
Computational Statistics & Data Analysis.
```

## License

MIT
