Compare ensemble classifiers using resampling#

Ensemble classifiers have shown to improve classification performance compare to single learner. However, they will be affected by class imbalance. This example shows the benefit of balancing the training set before to learn learners. We are making the comparison with non-balanced ensemble methods.

We make a comparison using skore.evaluate to obtain a structured report of the different classifiers.

# Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>
# License: MIT
print(__doc__)

Load an imbalanced dataset#

We will load the UCI SatImage dataset which has an imbalanced ratio of 9.3:1 (number of majority sample for a minority sample). The data are then split into training and testing.

from sklearn.model_selection import train_test_split
from imblearn.datasets import fetch_datasets

satimage = fetch_datasets()["satimage"]
X, y = satimage.data, satimage.target
X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=0)

Classification using a single decision tree#

We train a decision tree classifier which will be used as a baseline for the rest of this example.

The results are reported using skore.evaluate which provides a structured report of the classifier performance.

import skore
from sklearn.tree import DecisionTreeClassifier

tree = DecisionTreeClassifier()
tree.fit(X_train, y_train)

report_tree = skore.evaluate(tree, X_test, y_test, splitter="prefit")
report_tree.metrics.summarize().frame()
DecisionTreeClassifier
Metric Label / Average
Accuracy 0.916718
Precision -1 0.951989
1 0.576159
Recall -1 0.955923
1 0.554140
ROC AUC 0.755031
Log loss 3.001771
Brier score 0.083282
Fit time (s) NaN
Predict time (s) 0.000975


report_tree.metrics.confusion_matrix().plot()
Confusion Matrix Decision threshold: 0.50 Data source: Test set
<Figure size 600x600 with 1 Axes>

Classification using bagging classifier with and without sampling#

Instead of using a single tree, we will check if an ensemble of decision tree can actually alleviate the issue induced by the class imbalancing. First, we will use a bagging classifier and its counter part which internally uses a random under-sampling to balanced each bootstrap sample.

from sklearn.ensemble import BaggingClassifier

from imblearn.ensemble import BalancedBaggingClassifier

bagging = BaggingClassifier(n_estimators=50, random_state=0)
balanced_bagging = BalancedBaggingClassifier(n_estimators=50, random_state=0)

bagging.fit(X_train, y_train)
balanced_bagging.fit(X_train, y_train)
BalancedBaggingClassifier(n_estimators=50, random_state=0)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


Balancing each bootstrap sample allows to increase significantly the balanced accuracy and the geometric mean.

report_bagging = skore.evaluate(bagging, X_test, y_test, splitter="prefit")
report_bagging.metrics.summarize().frame()
BaggingClassifier
Metric Label / Average
Accuracy 0.931635
Precision -1 0.945551
1 0.728155
Recall -1 0.980716
1 0.477707
ROC AUC 0.934639
Log loss 0.203957
Brier score 0.050330
Fit time (s) NaN
Predict time (s) 0.009813


report_balanced_bagging = skore.evaluate(
    balanced_bagging, X_test, y_test, splitter="prefit"
)
report_balanced_bagging.metrics.summarize().frame()
BalancedBaggingClassifier
Metric Label / Average
Accuracy 0.899938
Precision -1 0.977088
1 0.492188
Recall -1 0.910468
1 0.802548
ROC AUC 0.948735
Log loss 0.259689
Brier score 0.075260
Fit time (s) NaN
Predict time (s) 0.010129


report_bagging.metrics.confusion_matrix().plot()
Confusion Matrix Decision threshold: 0.50 Data source: Test set
<Figure size 600x600 with 1 Axes>
report_balanced_bagging.metrics.confusion_matrix().plot()
Confusion Matrix Decision threshold: 0.50 Data source: Test set
<Figure size 600x600 with 1 Axes>

Classification using random forest classifier with and without sampling#

Random forest is another popular ensemble method and it is usually outperforming bagging. Here, we used a vanilla random forest and its balanced counterpart in which each bootstrap sample is balanced.

from sklearn.ensemble import RandomForestClassifier

from imblearn.ensemble import BalancedRandomForestClassifier

rf = RandomForestClassifier(n_estimators=50, random_state=0)
brf = BalancedRandomForestClassifier(
    n_estimators=50,
    sampling_strategy="all",
    replacement=True,
    bootstrap=False,
    random_state=0,
)

rf.fit(X_train, y_train)
brf.fit(X_train, y_train)
BalancedRandomForestClassifier(n_estimators=50, random_state=0)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


Similarly to the previous experiment, the balanced classifier outperform the classifier which learn from imbalanced bootstrap samples. In addition, random forest outperforms the bagging classifier.

report_rf = skore.evaluate(rf, X_test, y_test, splitter="prefit")
report_rf.metrics.summarize().frame()
RandomForestClassifier
Metric Label / Average
Accuracy 0.937228
Precision -1 0.944700
1 0.811111
Recall -1 0.988292
1 0.464968
ROC AUC 0.946511
Log loss 0.190004
Brier score 0.044992
Fit time (s) NaN
Predict time (s) 0.007035


report_brf = skore.evaluate(brf, X_test, y_test, splitter="prefit")
report_brf.metrics.summarize().frame()
BalancedRandomForestClassifier
Metric Label / Average
Accuracy 0.902424
Precision -1 0.980698
1 0.500000
Recall -1 0.909780
1 0.834395
ROC AUC 0.951902
Log loss 0.234209
Brier score 0.072699
Fit time (s) NaN
Predict time (s) 0.007424


report_rf.metrics.confusion_matrix().plot()
Confusion Matrix Decision threshold: 0.50 Data source: Test set
<Figure size 600x600 with 1 Axes>
report_brf.metrics.confusion_matrix().plot()
Confusion Matrix Decision threshold: 0.50 Data source: Test set
<Figure size 600x600 with 1 Axes>

Boosting classifier#

In the same manner, easy ensemble classifier is a bag of balanced AdaBoost classifier. However, it will be slower to train than random forest and will achieve worse performance.

from sklearn.ensemble import AdaBoostClassifier

from imblearn.ensemble import EasyEnsembleClassifier, RUSBoostClassifier

estimator = AdaBoostClassifier(n_estimators=10)
eec = EasyEnsembleClassifier(n_estimators=10, estimator=estimator)
eec.fit(X_train, y_train)

rusboost = RUSBoostClassifier(n_estimators=10, estimator=estimator)
rusboost.fit(X_train, y_train)
RUSBoostClassifier(estimator=AdaBoostClassifier(n_estimators=10),
                   n_estimators=10)
In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook.
On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.


report_eec = skore.evaluate(eec, X_test, y_test, splitter="prefit")
report_eec.metrics.summarize().frame()
EasyEnsembleClassifier
Metric Label / Average
Accuracy 0.811063
Precision -1 0.979933
1 0.322034
Recall -1 0.807163
1 0.847134
ROC AUC 0.914765
Log loss 0.485174
Brier score 0.154640
Fit time (s) NaN
Predict time (s) 0.012756


report_rusboost = skore.evaluate(rusboost, X_test, y_test, splitter="prefit")
report_rusboost.metrics.summarize().frame()
RUSBoostClassifier
Metric Label / Average
Accuracy 0.821007
Precision -1 0.980198
1 0.335013
Recall -1 0.818182
1 0.847134
ROC AUC 0.867595
Log loss 0.361076
Brier score 0.107209
Fit time (s) NaN
Predict time (s) 0.002467


report_eec.metrics.confusion_matrix().plot()
Confusion Matrix Decision threshold: 0.50 Data source: Test set
<Figure size 600x600 with 1 Axes>
import matplotlib.pyplot as plt

report_rusboost.metrics.confusion_matrix().plot()
plt.show()
Confusion Matrix Decision threshold: 0.50 Data source: Test set

Total running time of the script: (1 minutes 31.709 seconds)

Estimated memory usage: 525 MB

Gallery generated by Sphinx-Gallery