Metadata-Version: 2.4
Name: deepgboost
Version: 0.3.4
Summary: Distributed Gradient Boosting Forest — deep graph tree ensemble algorithm
Author-email: ThinBaker <iamthinbaker@gmail.com>
License: Apache-2.0
Project-URL: Homepage, https://github.com/iamthinbaker/deepgboost
Project-URL: Repository, https://github.com/iamthinbaker/deepgboost
Project-URL: Documentation, https://iamthinbaker.github.io/deepgboost/
Project-URL: Bug Tracker, https://github.com/iamthinbaker/deepgboost/issues
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.24
Requires-Dist: scipy>=1.10
Requires-Dist: scikit-learn>=1.3
Requires-Dist: pandas>=2.0
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"
Requires-Dist: pytest-cov; extra == "dev"
Requires-Dist: nbmake>=1.5; extra == "dev"
Requires-Dist: matplotlib>=3.7; extra == "dev"
Requires-Dist: ruff>=0.15.8; extra == "dev"
Requires-Dist: pre-commit>=4; extra == "dev"
Requires-Dist: tqdm==4.67.3; extra == "dev"
Requires-Dist: xgboost>=2.0; extra == "dev"
Provides-Extra: plotting
Requires-Dist: matplotlib>=3.7; extra == "plotting"
Provides-Extra: docs
Requires-Dist: mkdocs>=1.6; extra == "docs"
Requires-Dist: mkdocs-material>=9.5; extra == "docs"
Requires-Dist: mkdocstrings[python]>=0.25; extra == "docs"
Requires-Dist: mkdocs-jupyter>=0.24; extra == "docs"
Requires-Dist: pillow>=10.0; extra == "docs"
Requires-Dist: cairosvg>=2.7; extra == "docs"
Requires-Dist: matplotlib>=3.7; extra == "docs"
Dynamic: license-file

#  DeepGBoost

[![PyPI version](https://badge.fury.io/py/deepgboost.svg)](https://pypi.python.org/pypi/deepgboost/)
[![CI](https://github.com/iamthinbaker/deepgboost/actions/workflows/ci.yml/badge.svg)](https://github.com/iamthinbaker/deepgboost/actions/workflows/ci.yml)
[![Coverage](https://codecov.io/gh/iamthinbaker/deepgboost/branch/main/graph/badge.svg)](https://codecov.io/gh/iamthinbaker/deepgboost)
[![License](https://img.shields.io/badge/License-Apache%202.0-blue.svg)](LICENSE)
[![PythonVersion](https://img.shields.io/pypi/pyversions/deepgboost.svg)](https://pypi.org/project/deepgboost/)

Machine Learning algorithm based on gradient boosting forest that merges the power of tree ensembles with neural network architectures.

<div align="center"><img src="./docs/img/icon.svg" width="80%"></div>

## ⚙️ Installation

```bash
pip install deepgboost
```

Optional plotting support:

```bash
pip install deepgboost[plotting]
```

To install from source with development dependencies:

```bash
git clone https://github.com/iamthinbaker/deepgboost.git
cd deepgboost
pip install -e .
```


## 🚀 Usage

### Quick Start

```python
from sklearn.datasets import load_diabetes
from sklearn.model_selection import train_test_split
from deepgboost import DeepGBoostRegressor

X, y = load_diabetes(return_X_y=True)
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42)

model = DeepGBoostRegressor(
    n_trees=10,
    n_layers=15,
    max_depth=4,
    learning_rate=0.1,
).fit(X_train, y_train)

predictions = model.predict(X_test)
```

### 📓 Examples

Detailed usage examples are available in the [examples/](examples/) directory:

- [quickstart.ipynb](examples/quickstart.ipynb) — full tour of the API (regression, classification, callbacks, feature importances)
- [classifier.ipynb](examples/classifier.ipynb) — binary and multiclass classification walkthrough
- [regressor.ipynb](examples/regressor.ipynb) — regression walkthrough
- [serialization.ipynb](examples/serialization.ipynb) — saving and loading trained models with pickle

## 🧠 DeepGBoost

### Algorithm

DeepGBoost implements the **Distributed Gradient Boosting Forest (DGBF)**, a novel tree ensemble algorithm introduced in:

> Delgado-Panadero, Á., Benítez-Andrades, J. A., & García-Ordás, M. T. (2023). *A generalized decision tree ensemble based on the NeuralNetworks architecture: Distributed Gradient Boosting Forest (DGBF)*. Applied Intelligence, 53, 22991–23003. https://doi.org/10.1007/s10489-023-04735-w

Classical tree ensemble methods — RandomForest (*bagging*) and GradientBoosting (*boosting*) — are powerful for tabular data but cannot perform hierarchical representation learning as Neural Networks do. DGBF addresses this by mathematically combining both bagging and boosting into a unified formulation that defines a **graph-structured tree ensemble with distributed representation learning**, without requiring back-propagation or parametric models.

The core idea is to distribute the gradient descent of each boosting step across the individual trees of a RandomForest layer, so that each tree learns an independent gradient component:

</br>

$$F_i(x) = \sum_{l=1}^{L} RF_l(x) = \frac{1}{T} \sum_{l=0}^{L} \sum_{t=0}^{T} h_{l,t}(x)$$

</br>

where *L* is the number of boosting layers and *T* is the number of trees per layer. This structure is a direct analogue of a **Dense Neural Network**, where each RandomForest layer corresponds to a network layer, with distributed gradients replacing back-propagation.


<div align="center" style="width:80%; margin:auto;">
<img src="docs/img/fig1.png" width="80%">
<p><strong>Fig. 1</strong> — <strong>NeuralNetwork vs DGBF architecture</strong>: In NN (left), each neuron's output feeds into the next layer via back-propagation. In DGBF (right), the distributed gradients of all trees from each layer are forwarded to every tree of the following layer.</p>
</div>

Both RandomForest and GradientBoosting emerge naturally as special cases of DGBF: RandomForest is recovered with a single layer (*L* = 1) and GradientBoosting with a single tree per layer (*T* = 1).


<div align="center" style="width:80%; margin:auto;">
<img src="docs/img/fig2.png" width="80%">
<p><strong>Fig. 2</strong> — <strong>RandomForest & GradientBoosting as DGBF special cases</strong>: RandomForest (left) and GradientBoosting (right) represented as particular graph architectures of DGBF.</p>
</div>

### 📊 Benchmark

DGBF was evaluated against RandomForest (RF) and GradientBoosting (GBDT) on 9 regression datasets from the UCI Machine Learning Repository (Parkinson, Wine, Concrete, Obesity, NavalVessel, Temperature, Cargo2000, BikeSales, Superconduct), using 200 randomized simulations per dataset with an 80/20 train-test split.

<div align="center"><img src="docs/img/benchmark.png" width="80%"></div>

> [!note] **Winner DeepGBoost**
> 🏆 DGBF surpasses the mean R² score of both GradientBoosting and RandomForest in 7 out of 9 datasets

To reproduce the benchmark, run the experiment script from the `benchmark/` directory:

```bash
cd benchmark
python run_experiments.py
```

The script reads its configuration from `benchmark/config.json`, where you can adjust the models, hyperparameters, datasets, and experiment settings (e.g. number of bootstrap runs). Results are saved to `benchmark/results/`.

## 🤝 Contributing

Contributions are welcome. See [CONTRIBUTING.md](CONTRIBUTING.md) for development setup, code style, and pull request guidelines.

## 📄 Citation

If you use DeepGBoost in your research, please cite using the metadata in [CITATION.cff](CITATION.cff) or the BibTeX entry provided by GitHub ("Cite this repository" button).
