
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/pipeline/plot_pipeline_classification.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_pipeline_plot_pipeline_classification.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_pipeline_plot_pipeline_classification.py:


====================================
Usage of pipeline embedding samplers
====================================

An example of the :class:~imblearn.pipeline.Pipeline` object (or
:func:`~imblearn.pipeline.make_pipeline` helper function) working with
transformers and resamplers.

.. GENERATED FROM PYTHON SOURCE LINES 10-15

.. code-block:: Python


    # Authors: Christos Aridas
    #          Guillaume Lemaitre <g.lemaitre58@gmail.com>
    # License: MIT








.. GENERATED FROM PYTHON SOURCE LINES 16-18

.. code-block:: Python

    print(__doc__)








.. GENERATED FROM PYTHON SOURCE LINES 19-20

Let's first create an imbalanced dataset and split in to two sets.

.. GENERATED FROM PYTHON SOURCE LINES 22-40

.. code-block:: Python

    from sklearn.datasets import make_classification
    from sklearn.model_selection import train_test_split

    X, y = make_classification(
        n_classes=2,
        class_sep=1.25,
        weights=[0.3, 0.7],
        n_informative=3,
        n_redundant=1,
        flip_y=0,
        n_features=5,
        n_clusters_per_class=1,
        n_samples=5000,
        random_state=10,
    )

    X_train, X_test, y_train, y_test = train_test_split(X, y, stratify=y, random_state=42)








.. GENERATED FROM PYTHON SOURCE LINES 41-42

Now, we will create each individual steps that we would like later to combine

.. GENERATED FROM PYTHON SOURCE LINES 44-55

.. code-block:: Python

    from sklearn.decomposition import PCA
    from sklearn.neighbors import KNeighborsClassifier

    from imblearn.over_sampling import SMOTE
    from imblearn.under_sampling import EditedNearestNeighbours

    pca = PCA(n_components=2)
    enn = EditedNearestNeighbours()
    smote = SMOTE(random_state=0)
    knn = KNeighborsClassifier(n_neighbors=1)








.. GENERATED FROM PYTHON SOURCE LINES 56-59

Now, we can finally create a pipeline to specify in which order the different
transformers and samplers should be executed before to provide the data to
the final classifier.

.. GENERATED FROM PYTHON SOURCE LINES 61-65

.. code-block:: Python

    from imblearn.pipeline import make_pipeline

    model = make_pipeline(pca, enn, smote, knn)








.. GENERATED FROM PYTHON SOURCE LINES 66-69

We can now use the pipeline created as a normal classifier where resampling
will happen when calling `fit` and disabled when calling `decision_function`,
`predict_proba`, or `predict`.

.. GENERATED FROM PYTHON SOURCE LINES 71-77

.. code-block:: Python

    import skore

    model.fit(X_train, y_train)

    report = skore.evaluate(model, X_test, y_test, splitter="prefit")
    report.metrics.summarize().frame()





.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th></th>
          <th>KNeighborsClassifier</th>
        </tr>
        <tr>
          <th>Metric</th>
          <th>Label / Average</th>
          <th></th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>Accuracy</th>
          <th></th>
          <td>0.994400</td>
        </tr>
        <tr>
          <th rowspan="2" valign="top">Precision</th>
          <th>0</th>
          <td>0.991979</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0.995434</td>
        </tr>
        <tr>
          <th rowspan="2" valign="top">Recall</th>
          <th>0</th>
          <td>0.989333</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0.996571</td>
        </tr>
        <tr>
          <th>ROC AUC</th>
          <th></th>
          <td>0.992952</td>
        </tr>
        <tr>
          <th>Log loss</th>
          <th></th>
          <td>0.201844</td>
        </tr>
        <tr>
          <th>Brier score</th>
          <th></th>
          <td>0.005600</td>
        </tr>
        <tr>
          <th>Fit time (s)</th>
          <th></th>
          <td>NaN</td>
        </tr>
        <tr>
          <th>Predict time (s)</th>
          <th></th>
          <td>0.002275</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 17.138 seconds)

**Estimated memory usage:**  286 MB


.. _sphx_glr_download_auto_examples_pipeline_plot_pipeline_classification.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_pipeline_classification.ipynb <plot_pipeline_classification.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_pipeline_classification.py <plot_pipeline_classification.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_pipeline_classification.zip <plot_pipeline_classification.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
