
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/applications/plot_multi_class_under_sampling.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_applications_plot_multi_class_under_sampling.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_applications_plot_multi_class_under_sampling.py:


=============================================
Multiclass classification with under-sampling
=============================================

Some balancing methods allow for balancing dataset with multiples classes.
We provide an example to illustrate the use of those methods which do
not differ from the binary case.

.. GENERATED FROM PYTHON SOURCE LINES 11-52




.. rst-class:: sphx-glr-script-out

 .. code-block:: none


    Training target statistics: Counter({np.int64(1): 38, np.int64(2): 38, np.int64(0): 17})
    Testing target statistics: Counter({np.int64(1): 12, np.int64(2): 12, np.int64(0): 8})




.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th></th>
          <th>LogisticRegression</th>
        </tr>
        <tr>
          <th>Metric</th>
          <th>Label / Average</th>
          <th></th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>Accuracy</th>
          <th></th>
          <td>0.812500</td>
        </tr>
        <tr>
          <th rowspan="3" valign="top">Precision</th>
          <th>0</th>
          <td>1.000000</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0.875000</td>
        </tr>
        <tr>
          <th>2</th>
          <td>0.687500</td>
        </tr>
        <tr>
          <th rowspan="3" valign="top">Recall</th>
          <th>0</th>
          <td>1.000000</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0.583333</td>
        </tr>
        <tr>
          <th>2</th>
          <td>0.916667</td>
        </tr>
        <tr>
          <th rowspan="3" valign="top">ROC AUC</th>
          <th>0</th>
          <td>1.000000</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0.958333</td>
        </tr>
        <tr>
          <th>2</th>
          <td>0.966667</td>
        </tr>
        <tr>
          <th>Log loss</th>
          <th></th>
          <td>0.374343</td>
        </tr>
        <tr>
          <th>Fit time (s)</th>
          <th></th>
          <td>NaN</td>
        </tr>
        <tr>
          <th>Predict time (s)</th>
          <th></th>
          <td>0.000601</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />



|

.. code-block:: Python


    # Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>
    # License: MIT

    from collections import Counter

    import skore
    from sklearn.datasets import load_iris
    from sklearn.linear_model import LogisticRegression
    from sklearn.model_selection import train_test_split
    from sklearn.preprocessing import StandardScaler

    from imblearn.datasets import make_imbalance
    from imblearn.pipeline import make_pipeline
    from imblearn.under_sampling import NearMiss

    print(__doc__)

    RANDOM_STATE = 42

    # Create a folder to fetch the dataset
    iris = load_iris()
    X, y = make_imbalance(
        iris.data,
        iris.target,
        sampling_strategy={0: 25, 1: 50, 2: 50},
        random_state=RANDOM_STATE,
    )

    X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=RANDOM_STATE)

    print(f"Training target statistics: {Counter(y_train)}")
    print(f"Testing target statistics: {Counter(y_test)}")

    # Create a pipeline
    pipeline = make_pipeline(NearMiss(version=2), StandardScaler(), LogisticRegression())
    pipeline.fit(X_train, y_train)

    # Classify and report the results
    report = skore.evaluate(pipeline, X_test, y_test, splitter="prefit")
    report.metrics.summarize().frame()


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 4.532 seconds)

**Estimated memory usage:**  313 MB


.. _sphx_glr_download_auto_examples_applications_plot_multi_class_under_sampling.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_multi_class_under_sampling.ipynb <plot_multi_class_under_sampling.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_multi_class_under_sampling.py <plot_multi_class_under_sampling.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_multi_class_under_sampling.zip <plot_multi_class_under_sampling.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
