
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/applications/plot_topic_classication.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        :ref:`Go to the end <sphx_glr_download_auto_examples_applications_plot_topic_classication.py>`
        to download the full example code.

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_applications_plot_topic_classication.py:


=================================================
Example of topic classification in text documents
=================================================

This example shows how to balance the text data before to train a classifier.

Note that for this example, the data are slightly imbalanced but it can happen
that for some data sets, the imbalanced ratio is more significant.

.. GENERATED FROM PYTHON SOURCE LINES 11-15

.. code-block:: Python


    # Authors: Guillaume Lemaitre <g.lemaitre58@gmail.com>
    # License: MIT








.. GENERATED FROM PYTHON SOURCE LINES 16-18

.. code-block:: Python

    print(__doc__)








.. GENERATED FROM PYTHON SOURCE LINES 19-27

Setting the data set
--------------------

We use a part of the 20 newsgroups data set by loading 4 topics. Using the
scikit-learn loader, the data are split into a training and a testing set.

Note the class \#3 is the minority class and has almost twice less samples
than the majority class.

.. GENERATED FROM PYTHON SOURCE LINES 29-48

.. code-block:: Python

    from sklearn.datasets import fetch_20newsgroups

    categories = [
        "alt.atheism",
        "talk.religion.misc",
        "comp.graphics",
        "sci.space",
    ]
    newsgroups_train = fetch_20newsgroups(subset="train", categories=categories)
    newsgroups_test = fetch_20newsgroups(subset="test", categories=categories)

    import numpy as np

    X_train = np.array(newsgroups_train.data)
    X_test = np.array(newsgroups_test.data)

    y_train = newsgroups_train.target
    y_test = newsgroups_test.target








.. GENERATED FROM PYTHON SOURCE LINES 49-54

.. code-block:: Python

    from collections import Counter

    print(f"Training class distributions summary: {Counter(y_train)}")
    print(f"Test class distributions summary: {Counter(y_test)}")





.. rst-class:: sphx-glr-script-out

 .. code-block:: none

    Training class distributions summary: Counter({np.int64(2): 593, np.int64(1): 584, np.int64(0): 480, np.int64(3): 377})
    Test class distributions summary: Counter({np.int64(2): 394, np.int64(1): 389, np.int64(0): 319, np.int64(3): 251})




.. GENERATED FROM PYTHON SOURCE LINES 55-64

The usual scikit-learn pipeline
-------------------------------

You might usually use scikit-learn pipeline by combining the TF-IDF
vectorizer to feed a multinomial naive bayes classifier. A classification
report summarized the results on the testing set.

As expected, the recall of the class \#3 is low mainly due to the class
imbalanced.

.. GENERATED FROM PYTHON SOURCE LINES 66-74

.. code-block:: Python

    import skore
    from sklearn.feature_extraction.text import TfidfVectorizer
    from sklearn.naive_bayes import MultinomialNB
    from sklearn.pipeline import make_pipeline

    model = make_pipeline(TfidfVectorizer(), MultinomialNB())
    model.fit(X_train, y_train)






.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <style>.sk-global {
      /* Definition of color scheme common for light and dark mode */
      --sklearn-color-text: #000;
      --sklearn-color-text-muted: #666;
      --sklearn-color-line: gray;
      /* Definition of color scheme for unfitted estimators */
      --sklearn-color-unfitted-level-0: #fff5e6;
      --sklearn-color-unfitted-level-1: #f6e4d2;
      --sklearn-color-unfitted-level-2: #ffe0b3;
      --sklearn-color-unfitted-level-3: chocolate;
      /* Definition of color scheme for fitted estimators */
      --sklearn-color-fitted-level-0: #f0f8ff;
      --sklearn-color-fitted-level-1: #d4ebff;
      --sklearn-color-fitted-level-2: #b3dbfd;
      --sklearn-color-fitted-level-3: cornflowerblue;
    }

    .sk-global.light {
      /* Specific color for light theme */
      --sklearn-color-text-on-default-background: black;
      --sklearn-color-background: white;
      --sklearn-color-border-box: black;
      --sklearn-color-icon: #696969;
    }

    .sk-global.dark {
      --sklearn-color-text-on-default-background: white;
      --sklearn-color-background: #111;
      --sklearn-color-border-box: white;
      --sklearn-color-icon: #878787;
    }

    .sk-global {
      color: var(--sklearn-color-text);
    }

    .sk-global pre {
      padding: 0;
    }

    .sk-global input.sk-hidden--visually {
      border: 0;
      clip: rect(1px 1px 1px 1px);
      clip: rect(1px, 1px, 1px, 1px);
      height: 1px;
      margin: -1px;
      overflow: hidden;
      padding: 0;
      position: absolute;
      width: 1px;
    }

    .sk-global div.sk-dashed-wrapped {
      border: 1px dashed var(--sklearn-color-line);
      margin: 0 0.4em 0.5em 0.4em;
      box-sizing: border-box;
      padding-bottom: 0.4em;
      background-color: var(--sklearn-color-background);
    }

    .sk-global div.sk-container {
      /* jupyter's `normalize.less` sets `[hidden] { display: none; }`
         but bootstrap.min.css set `[hidden] { display: none !important; }`
         so we also need the `!important` here to be able to override the
         default hidden behavior on the sphinx rendered scikit-learn.org.
         See: https://github.com/scikit-learn/scikit-learn/issues/21755 */
      display: inline-block !important;
      position: relative;
    }

    .sk-global div.sk-text-repr-fallback {
      display: none;
    }

    div.sk-parallel-item,
    div.sk-serial,
    div.sk-item {
      /* draw centered vertical line to link estimators */
      background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));
      background-size: 2px 100%;
      background-repeat: no-repeat;
      background-position: center center;
    }

    /* Parallel-specific style estimator block */

    .sk-global div.sk-parallel-item::after {
      content: "";
      width: 100%;
      border-bottom: 2px solid var(--sklearn-color-text-on-default-background);
      flex-grow: 1;
    }

    .sk-global div.sk-parallel {
      display: flex;
      align-items: stretch;
      justify-content: center;
      background-color: var(--sklearn-color-background);
      position: relative;
    }

    .sk-global div.sk-parallel-item {
      display: flex;
      flex-direction: column;
    }

    .sk-global div.sk-parallel-item:first-child::after {
      align-self: flex-end;
      width: 50%;
    }

    .sk-global div.sk-parallel-item:last-child::after {
      align-self: flex-start;
      width: 50%;
    }

    .sk-global div.sk-parallel-item:only-child::after {
      width: 0;
    }

    /* Serial-specific style estimator block */

    .sk-global div.sk-serial {
      display: flex;
      flex-direction: column;
      align-items: center;
      background-color: var(--sklearn-color-background);
      padding-right: 1em;
      padding-left: 1em;
    }


    /* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is
    clickable and can be expanded/collapsed.
    - Pipeline and ColumnTransformer use this feature and define the default style
    - Estimators will overwrite some part of the style using the `sk-estimator` class
    */

    /* Pipeline and ColumnTransformer style (default) */

    .sk-global div.sk-toggleable {
      /* Default theme specific background. It is overwritten whether we have a
      specific estimator or a Pipeline/ColumnTransformer */
      background-color: var(--sklearn-color-background);
    }

    /* Toggleable label */
    .sk-global label.sk-toggleable__label {
      cursor: pointer;
      display: flex;
      width: 100%;
      margin-bottom: 0;
      padding: 0.5em;
      box-sizing: border-box;
      text-align: center;
      align-items: center;
      justify-content: center;
      gap: 0.5em;
    }

    .sk-global label.sk-toggleable__label .caption {
      font-size: 0.6rem;
      font-weight: lighter;
      color: var(--sklearn-color-text-muted);
    }

    .sk-global label.sk-toggleable__label-arrow:before {
      /* Arrow on the left of the label */
      content: "▸";
      float: left;
      margin-right: 0.25em;
      color: var(--sklearn-color-icon);
    }

    .sk-global label.sk-toggleable__label-arrow:hover:before {
      color: var(--sklearn-color-text);
    }

    /* Toggleable content - dropdown */

    .sk-global div.sk-toggleable__content {
      display: none;
      text-align: left;
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-0);
    }

    .sk-global div.sk-toggleable__content.fitted {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-0);
    }

    .sk-global div.sk-toggleable__content pre {
      margin: 0.2em;
      border-radius: 0.25em;
      color: var(--sklearn-color-text);
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-0);
    }

    .sk-global div.sk-toggleable__content.fitted pre {
      /* unfitted */
      background-color: var(--sklearn-color-fitted-level-0);
    }

    .sk-global input.sk-toggleable__control:checked~div.sk-toggleable__content {
      /* Expand drop-down */
      display: block;
      width: 100%;
      overflow: visible;
    }

    .sk-global input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {
      content: "▾";
    }

    /* Pipeline/ColumnTransformer-specific style */

    .sk-global div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {
      color: var(--sklearn-color-text);
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    .sk-global div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {
      background-color: var(--sklearn-color-fitted-level-2);
    }

    /* Estimator-specific style */

    /* Colorize estimator box */
    .sk-global div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    .sk-global div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-2);
    }

    .sk-global div.sk-label label.sk-toggleable__label,
    .sk-global div.sk-label label {
      /* The background is the default theme color */
      color: var(--sklearn-color-text-on-default-background);
    }

    /* On hover, darken the color of the background */
    .sk-global div.sk-label:hover label.sk-toggleable__label {
      color: var(--sklearn-color-text);
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    /* Label box, darken color on hover, fitted */
    .sk-global div.sk-label.fitted:hover label.sk-toggleable__label.fitted {
      color: var(--sklearn-color-text);
      background-color: var(--sklearn-color-fitted-level-2);
    }

    /* Estimator label */

    .sk-global div.sk-label label {
      font-family: monospace;
      font-weight: bold;
      line-height: 1.2em;
    }

    .sk-global div.sk-label-container {
      text-align: center;
    }

    /* Estimator-specific */
    .sk-global div.sk-estimator {
      font-family: monospace;
      border: 1px dotted var(--sklearn-color-border-box);
      border-radius: 0.25em;
      box-sizing: border-box;
      margin-bottom: 0.5em;
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-0);
    }

    .sk-global div.sk-estimator.fitted {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-0);
    }

    /* on hover */
    .sk-global div.sk-estimator:hover {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    .sk-global div.sk-estimator.fitted:hover {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-2);
    }

    /* Specification for estimator info (e.g. "i" and "?") */

    /* Common style for "i" and "?" */

    .sk-estimator-doc-link,
    a:link.sk-estimator-doc-link,
    a:visited.sk-estimator-doc-link {
      float: right;
      font-size: smaller;
      line-height: 1em;
      font-family: monospace;
      background-color: var(--sklearn-color-unfitted-level-0);
      border-radius: 1em;
      height: 1em;
      width: 1em;
      text-decoration: none !important;
      margin-left: 0.5em;
      text-align: center;
      /* unfitted */
      border: var(--sklearn-color-unfitted-level-3) 1pt solid;
      color: var(--sklearn-color-unfitted-level-3);
    }

    .sk-estimator-doc-link.fitted,
    a:link.sk-estimator-doc-link.fitted,
    a:visited.sk-estimator-doc-link.fitted {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-0);
      border: var(--sklearn-color-fitted-level-3) 1pt solid;
      color: var(--sklearn-color-fitted-level-3);
    }

    /* On hover */
    div.sk-estimator:hover .sk-estimator-doc-link:hover,
    .sk-estimator-doc-link:hover,
    div.sk-label-container:hover .sk-estimator-doc-link:hover,
    .sk-estimator-doc-link:hover {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-3);
      border: var(--sklearn-color-fitted-level-0) 1pt solid;
      color: var(--sklearn-color-unfitted-level-0);
      text-decoration: none;
    }

    div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,
    .sk-estimator-doc-link.fitted:hover,
    div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,
    .sk-estimator-doc-link.fitted:hover {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-3);
      border: var(--sklearn-color-fitted-level-0) 1pt solid;
      color: var(--sklearn-color-fitted-level-0);
      text-decoration: none;
    }

    /* Span, style for the box shown on hovering the info icon */
    .sk-estimator-doc-link span {
      display: none;
      z-index: 9999;
      position: relative;
      font-weight: normal;
      right: .2ex;
      padding: .5ex;
      margin: .5ex;
      width: min-content;
      min-width: 20ex;
      max-width: 50ex;
      color: var(--sklearn-color-text);
      box-shadow: 2pt 2pt 4pt #999;
      /* unfitted */
      background: var(--sklearn-color-unfitted-level-0);
      border: .5pt solid var(--sklearn-color-unfitted-level-3);
    }

    .sk-estimator-doc-link.fitted span {
      /* fitted */
      background: var(--sklearn-color-fitted-level-0);
      border: var(--sklearn-color-fitted-level-3);
    }

    .sk-estimator-doc-link:hover span {
      display: block;
    }

    /* "?"-specific style due to the `<a>` HTML tag */

    .sk-global a.estimator_doc_link {
      float: right;
      font-size: 1rem;
      line-height: 1em;
      font-family: monospace;
      background-color: var(--sklearn-color-unfitted-level-0);
      border-radius: 1rem;
      height: 1rem;
      width: 1rem;
      text-decoration: none;
      /* unfitted */
      color: var(--sklearn-color-unfitted-level-1);
      border: var(--sklearn-color-unfitted-level-1) 1pt solid;
    }

    .sk-global a.estimator_doc_link.fitted {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-0);
      border: var(--sklearn-color-fitted-level-1) 1pt solid;
      color: var(--sklearn-color-fitted-level-1);
    }

    /* On hover */
    .sk-global a.estimator_doc_link:hover {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-3);
      color: var(--sklearn-color-background);
      text-decoration: none;
    }

    .sk-global a.estimator_doc_link.fitted:hover {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-3);
    }

    .estimator-table {
        font-family: monospace;
    }

    .estimator-table summary {
        padding: .5rem;
        cursor: pointer;
    }

    .estimator-table summary::marker {
        font-size: 0.7rem;
    }

    .estimator-table details[open] {
        padding-left: 0.1rem;
        padding-right: 0.1rem;
        padding-bottom: 0.3rem;
    }

    .estimator-table .parameters-table {
        margin-left: auto !important;
        margin-right: auto !important;
        margin-top: 0;
    }

    .estimator-table .parameters-table tr:nth-child(odd) {
        background-color: #fff;
    }

    .estimator-table .parameters-table tr:nth-child(even) {
        background-color: #f6f6f6;
    }

    .estimator-table .parameters-table tr:hover td {
        background-color: #e0e0e0;
    }

    .estimator-table table :is(td, th) {
        border: 1px solid rgba(106, 105, 104, 0.232);
    }

    /*
        `table td`is set in notebook with right text-align.
        We need to overwrite it.
    */
    .estimator-table table td.param {
        text-align: left;
        position: relative;
        padding: 0;
    }

    .user-set td {
        color:rgb(255, 94, 0);
        text-align: left !important;
    }

    .user-set td.value {
        color:rgb(255, 94, 0);
        background-color: transparent;
    }

    .default td, .estimator-table th {
        color: black;
        text-align: left !important;
    }

    .user-set td i,
    .default td i {
        color: black;
    }

    td.fitted-att-type {
        white-space: preserve nowrap;
    }

    /*
        Styles for parameter documentation links
        We need styling for visited so jupyter doesn't overwrite it
    */
    a.param-doc-link,
    a.param-doc-link:link,
    a.param-doc-link:visited {
        text-decoration: underline dashed;
        text-underline-offset: .3em;
        color: inherit;
        display: block;
        padding: .5em;
    }

    @supports(anchor-name: --doc-link) {
        a.param-doc-link,
        a.param-doc-link:link,
        a.param-doc-link:visited {
        anchor-name: --doc-link;
        }
    }

    /* "hack" to make the entire area of the cell containing the link clickable */
    a.param-doc-link::before {
        position: absolute;
        content: "";
        inset: 0;
    }

    .param-doc-description {
        display: none;
        position: absolute;
        z-index: 9999;
        left: 0;
        padding: .5ex;
        margin-left: 1.5em;
        color: var(--sklearn-color-text);
        box-shadow: .3em .3em .4em #999;
        width: max-content;
        text-align: left;
        max-height: 10em;
        overflow-y: auto;

        /* unfitted */
        background: var(--sklearn-color-unfitted-level-0);
        border: thin solid var(--sklearn-color-unfitted-level-3);
    }

    @supports(position-area: center right) {
        .param-doc-description {
        position-area: center right;
        position: fixed;
        margin-left: 0;
        }
    }

    /* Fitted state for parameter tooltips */
    .fitted .param-doc-description {
        /* fitted */
        background: var(--sklearn-color-fitted-level-0);
        border: thin solid var(--sklearn-color-fitted-level-3);
    }

    .param-doc-link:hover .param-doc-description {
        display: block;
    }

    .copy-paste-icon {
        background-image: url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCA0NDggNTEyIj48IS0tIUZvbnQgQXdlc29tZSBGcmVlIDYuNy4yIGJ5IEBmb250YXdlc29tZSAtIGh0dHBzOi8vZm9udGF3ZXNvbWUuY29tIExpY2Vuc2UgLSBodHRwczovL2ZvbnRhd2Vzb21lLmNvbS9saWNlbnNlL2ZyZWUgQ29weXJpZ2h0IDIwMjUgRm9udGljb25zLCBJbmMuLS0+PHBhdGggZD0iTTIwOCAwTDMzMi4xIDBjMTIuNyAwIDI0LjkgNS4xIDMzLjkgMTQuMWw2Ny45IDY3LjljOSA5IDE0LjEgMjEuMiAxNC4xIDMzLjlMNDQ4IDMzNmMwIDI2LjUtMjEuNSA0OC00OCA0OGwtMTkyIDBjLTI2LjUgMC00OC0yMS41LTQ4LTQ4bDAtMjg4YzAtMjYuNSAyMS41LTQ4IDQ4LTQ4ek00OCAxMjhsODAgMCAwIDY0LTY0IDAgMCAyNTYgMTkyIDAgMC0zMiA2NCAwIDAgNDhjMCAyNi41LTIxLjUgNDgtNDggNDhMNDggNTEyYy0yNi41IDAtNDgtMjEuNS00OC00OEwwIDE3NmMwLTI2LjUgMjEuNS00OCA0OC00OHoiLz48L3N2Zz4=);
        background-repeat: no-repeat;
        background-size: 14px 14px;
        background-position: 0;
        display: inline-block;
        width: 14px;
        height: 14px;
        cursor: pointer;
    }
    </style><body><div id="sk-container-id-1" class="sk-top-container sk-global"><div class="sk-text-repr-fallback"><pre>Pipeline(steps=[(&#x27;tfidfvectorizer&#x27;, TfidfVectorizer()),
                    (&#x27;multinomialnb&#x27;, MultinomialNB())])</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label fitted sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually sk-global" id="sk-estimator-id-1" type="checkbox" ><label for="sk-estimator-id-1" class="sk-toggleable__label fitted sk-toggleable__label-arrow"><div><div>Pipeline</div></div><div><a class="sk-estimator-doc-link fitted" rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.pipeline.Pipeline.html">?<span>Documentation for Pipeline</span></a><span class="sk-estimator-doc-link fitted">i<span>Fitted</span></span></div></label><div class="sk-toggleable__content fitted" data-param-prefix="">
            <div class="estimator-table">
                <details>
                    <summary>Parameters</summary>
                    <table class="parameters-table">
                      <tbody>
                    
            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('steps',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-steps;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.pipeline.Pipeline.html#:~:text=steps,-list%20of%20tuples">
                steps
                <span class="param-doc-description"
                style="position-anchor: --doc-link-steps;">
                steps: list of tuples<br><br>List of (name of step, estimator) tuples that are to be chained in<br>sequential order. To be compatible with the scikit-learn API, all steps<br>must define `fit`. All non-last steps must also define `transform`. See<br>:ref:`Combining Estimators &lt;combining_estimators&gt;` for more details.</span>
            </a>
        </td>
                <td class="value">[(&#x27;tfidfvectorizer&#x27;, ...), (&#x27;multinomialnb&#x27;, ...)]</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('transform_input',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-transform_input;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.pipeline.Pipeline.html#:~:text=transform_input,-list%20of%20str%2C%20default%3DNone">
                transform_input
                <span class="param-doc-description"
                style="position-anchor: --doc-link-transform_input;">
                transform_input: list of str, default=None<br><br>The names of the :term:`metadata` parameters that should be transformed by the<br>pipeline before passing it to the step consuming it.<br><br>This enables transforming some input arguments to ``fit`` (other than ``X``)<br>to be transformed by the steps of the pipeline up to the step which requires<br>them. Requirement is defined via :ref:`metadata routing &lt;metadata_routing&gt;`.<br>For instance, this can be used to pass a validation set through the pipeline.<br><br>You can only set this if metadata routing is enabled, which you<br>can enable using ``sklearn.set_config(enable_metadata_routing=True)``.<br><br>.. versionadded:: 1.6</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('memory',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-memory;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.pipeline.Pipeline.html#:~:text=memory,-str%20or%20object%20with%20the%20joblib.Memory%20interface%2C%20default%3DNone">
                memory
                <span class="param-doc-description"
                style="position-anchor: --doc-link-memory;">
                memory: str or object with the joblib.Memory interface, default=None<br><br>Used to cache the fitted transformers of the pipeline. The last step<br>will never be cached, even if it is a transformer. By default, no<br>caching is performed. If a string is given, it is the path to the<br>caching directory. Enabling caching triggers a clone of the transformers<br>before fitting. Therefore, the transformer instance given to the<br>pipeline cannot be inspected directly. Use the attribute ``named_steps``<br>or ``steps`` to inspect estimators within the pipeline. Caching the<br>transformers is advantageous when fitting is time consuming. See<br>:ref:`sphx_glr_auto_examples_neighbors_plot_caching_nearest_neighbors.py`<br>for an example on how to enable caching.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('verbose',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-verbose;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.pipeline.Pipeline.html#:~:text=verbose,-bool%2C%20default%3DFalse">
                verbose
                <span class="param-doc-description"
                style="position-anchor: --doc-link-verbose;">
                verbose: bool, default=False<br><br>If True, the time elapsed while fitting each step will be printed as it<br>is completed.</span>
            </a>
        </td>
                <td class="value">False</td>
            </tr>
    
                      </tbody>
                    </table>
                </details>
            </div>
    
            <div class="estimator-table">
                <details>
                    <summary>Fitted attributes</summary>
                    <table class="parameters-table">
                        <tbody>
                            <tr>
                            <th>Name</th>
                            <th>Type</th>
                            <th>Value</th>
                            </tr>
                        
           <tr class="default">
               <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-classes_;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.pipeline.Pipeline.html#:~:text=classes_,-ndarray%20of%20shape%20%28n_classes%2C%29">
                classes_
                <span class="param-doc-description"
                style="position-anchor: --doc-link-classes_;">
                classes_: ndarray of shape (n_classes,)<br><br>The classes labels. Only exist if the last step of the pipeline is a<br>classifier.</span>
            </a>
        </td>
               <td class="fitted-att-type">ndarray[int64](4,)</td>
               <td>[0,1,2,3]</td>


           </tr>
    
                        </tbody>
                    </table>
                </details>
            </div>
        </div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator fitted sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually sk-global" id="sk-estimator-id-2" type="checkbox" ><label for="sk-estimator-id-2" class="sk-toggleable__label fitted sk-toggleable__label-arrow"><div><div>TfidfVectorizer</div></div><div><a class="sk-estimator-doc-link fitted" rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html">?<span>Documentation for TfidfVectorizer</span></a></div></label><div class="sk-toggleable__content fitted" data-param-prefix="tfidfvectorizer__">
            <div class="estimator-table">
                <details>
                    <summary>Parameters</summary>
                    <table class="parameters-table">
                      <tbody>
                    
            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('input',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-input;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=input,-%7B%27filename%27%2C%20%27file%27%2C%20%27content%27%7D%2C%20default%3D%27content%27">
                input
                <span class="param-doc-description"
                style="position-anchor: --doc-link-input;">
                input: {&#x27;filename&#x27;, &#x27;file&#x27;, &#x27;content&#x27;}, default=&#x27;content&#x27;<br><br>- If `&#x27;filename&#x27;`, the sequence passed as an argument to fit is<br>  expected to be a list of filenames that need reading to fetch<br>  the raw content to analyze.<br><br>- If `&#x27;file&#x27;`, the sequence items must have a &#x27;read&#x27; method (file-like<br>  object) that is called to fetch the bytes in memory.<br><br>- If `&#x27;content&#x27;`, the input is expected to be a sequence of items that<br>  can be of type string or byte.</span>
            </a>
        </td>
                <td class="value">&#x27;content&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('encoding',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-encoding;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=encoding,-str%2C%20default%3D%27utf-8%27">
                encoding
                <span class="param-doc-description"
                style="position-anchor: --doc-link-encoding;">
                encoding: str, default=&#x27;utf-8&#x27;<br><br>If bytes or files are given to analyze, this encoding is used to<br>decode.</span>
            </a>
        </td>
                <td class="value">&#x27;utf-8&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('decode_error',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-decode_error;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=decode_error,-%7B%27strict%27%2C%20%27ignore%27%2C%20%27replace%27%7D%2C%20default%3D%27strict%27">
                decode_error
                <span class="param-doc-description"
                style="position-anchor: --doc-link-decode_error;">
                decode_error: {&#x27;strict&#x27;, &#x27;ignore&#x27;, &#x27;replace&#x27;}, default=&#x27;strict&#x27;<br><br>Instruction on what to do if a byte sequence is given to analyze that<br>contains characters not of the given `encoding`. By default, it is<br>&#x27;strict&#x27;, meaning that a UnicodeDecodeError will be raised. Other<br>values are &#x27;ignore&#x27; and &#x27;replace&#x27;.</span>
            </a>
        </td>
                <td class="value">&#x27;strict&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('strip_accents',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-strip_accents;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=strip_accents,-%7B%27ascii%27%2C%20%27unicode%27%7D%20or%20callable%2C%20default%3DNone">
                strip_accents
                <span class="param-doc-description"
                style="position-anchor: --doc-link-strip_accents;">
                strip_accents: {&#x27;ascii&#x27;, &#x27;unicode&#x27;} or callable, default=None<br><br>Remove accents and perform other character normalization<br>during the preprocessing step.<br>&#x27;ascii&#x27; is a fast method that only works on characters that have<br>a direct ASCII mapping.<br>&#x27;unicode&#x27; is a slightly slower method that works on any characters.<br>None (default) means no character normalization is performed.<br><br>Both &#x27;ascii&#x27; and &#x27;unicode&#x27; use NFKD normalization from<br>:func:`unicodedata.normalize`.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('lowercase',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-lowercase;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=lowercase,-bool%2C%20default%3DTrue">
                lowercase
                <span class="param-doc-description"
                style="position-anchor: --doc-link-lowercase;">
                lowercase: bool, default=True<br><br>Convert all characters to lowercase before tokenizing.</span>
            </a>
        </td>
                <td class="value">True</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('preprocessor',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-preprocessor;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=preprocessor,-callable%2C%20default%3DNone">
                preprocessor
                <span class="param-doc-description"
                style="position-anchor: --doc-link-preprocessor;">
                preprocessor: callable, default=None<br><br>Override the preprocessing (string transformation) stage while<br>preserving the tokenizing and n-grams generation steps.<br>Only applies if ``analyzer`` is not callable.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('tokenizer',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-tokenizer;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=tokenizer,-callable%2C%20default%3DNone">
                tokenizer
                <span class="param-doc-description"
                style="position-anchor: --doc-link-tokenizer;">
                tokenizer: callable, default=None<br><br>Override the string tokenization step while preserving the<br>preprocessing and n-grams generation steps.<br>Only applies if ``analyzer == &#x27;word&#x27;``.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('analyzer',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-analyzer;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=analyzer,-%7B%27word%27%2C%20%27char%27%2C%20%27char_wb%27%7D%20or%20callable%2C%20default%3D%27word%27">
                analyzer
                <span class="param-doc-description"
                style="position-anchor: --doc-link-analyzer;">
                analyzer: {&#x27;word&#x27;, &#x27;char&#x27;, &#x27;char_wb&#x27;} or callable, default=&#x27;word&#x27;<br><br>Whether the feature should be made of word or character n-grams.<br>Option &#x27;char_wb&#x27; creates character n-grams only from text inside<br>word boundaries; n-grams at the edges of words are padded with space.<br><br>If a callable is passed it is used to extract the sequence of features<br>out of the raw, unprocessed input.<br><br>.. versionchanged:: 0.21<br>    Since v0.21, if ``input`` is ``&#x27;filename&#x27;`` or ``&#x27;file&#x27;``, the data<br>    is first read from the file and then passed to the given callable<br>    analyzer.</span>
            </a>
        </td>
                <td class="value">&#x27;word&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('stop_words',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-stop_words;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=stop_words,-%7B%27english%27%7D%2C%20list%2C%20default%3DNone">
                stop_words
                <span class="param-doc-description"
                style="position-anchor: --doc-link-stop_words;">
                stop_words: {&#x27;english&#x27;}, list, default=None<br><br>If a string, it is passed to _check_stop_list and the appropriate stop<br>list is returned. &#x27;english&#x27; is currently the only supported string<br>value.<br>There are several known issues with &#x27;english&#x27; and you should<br>consider an alternative (see :ref:`stop_words`).<br><br>If a list, that list is assumed to contain stop words, all of which<br>will be removed from the resulting tokens.<br>Only applies if ``analyzer == &#x27;word&#x27;``.<br><br>If None, no stop words will be used. In this case, setting `max_df`<br>to a higher value, such as in the range (0.7, 1.0), can automatically detect<br>and filter stop words based on intra corpus document frequency of terms.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('token_pattern',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-token_pattern;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=token_pattern,-str%2C%20default%3Dr%22%28%3Fu%29%5C%5Cb%5C%5Cw%5C%5Cw%2B%5C%5Cb%22">
                token_pattern
                <span class="param-doc-description"
                style="position-anchor: --doc-link-token_pattern;">
                token_pattern: str, default=r&quot;(?u)\\b\\w\\w+\\b&quot;<br><br>Regular expression denoting what constitutes a &quot;token&quot;, only used<br>if ``analyzer == &#x27;word&#x27;``. The default regexp selects tokens of 2<br>or more alphanumeric characters (punctuation is completely ignored<br>and always treated as a token separator).<br><br>If there is a capturing group in token_pattern then the<br>captured group content, not the entire match, becomes the token.<br>At most one capturing group is permitted.</span>
            </a>
        </td>
                <td class="value">&#x27;(?u)\\b\\w\\w+\\b&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('ngram_range',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-ngram_range;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=ngram_range,-tuple%20%28min_n%2C%20max_n%29%2C%20default%3D%281%2C%201%29">
                ngram_range
                <span class="param-doc-description"
                style="position-anchor: --doc-link-ngram_range;">
                ngram_range: tuple (min_n, max_n), default=(1, 1)<br><br>The lower and upper boundary of the range of n-values for different<br>n-grams to be extracted. All values of n such that min_n &lt;= n &lt;= max_n<br>will be used. For example an ``ngram_range`` of ``(1, 1)`` means only<br>unigrams, ``(1, 2)`` means unigrams and bigrams, and ``(2, 2)`` means<br>only bigrams.<br>Only applies if ``analyzer`` is not callable.</span>
            </a>
        </td>
                <td class="value">(1, ...)</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('max_df',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-max_df;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=max_df,-float%20or%20int%2C%20default%3D1.0">
                max_df
                <span class="param-doc-description"
                style="position-anchor: --doc-link-max_df;">
                max_df: float or int, default=1.0<br><br>When building the vocabulary ignore terms that have a document<br>frequency strictly higher than the given threshold (corpus-specific<br>stop words).<br>If float in range [0.0, 1.0], the parameter represents a proportion of<br>documents, integer absolute counts.<br>This parameter is ignored if vocabulary is not None.</span>
            </a>
        </td>
                <td class="value">1.0</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('min_df',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-min_df;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=min_df,-float%20or%20int%2C%20default%3D1">
                min_df
                <span class="param-doc-description"
                style="position-anchor: --doc-link-min_df;">
                min_df: float or int, default=1<br><br>When building the vocabulary ignore terms that have a document<br>frequency strictly lower than the given threshold. This value is also<br>called cut-off in the literature.<br>If float in range of [0.0, 1.0], the parameter represents a proportion<br>of documents, integer absolute counts.<br>This parameter is ignored if vocabulary is not None.</span>
            </a>
        </td>
                <td class="value">1</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('max_features',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-max_features;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=max_features,-int%2C%20default%3DNone">
                max_features
                <span class="param-doc-description"
                style="position-anchor: --doc-link-max_features;">
                max_features: int, default=None<br><br>If not None, build a vocabulary that only consider the top<br>`max_features` ordered by term frequency across the corpus.<br>Otherwise, all features are used.<br><br>This parameter is ignored if vocabulary is not None.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('vocabulary',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-vocabulary;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=vocabulary,-Mapping%20or%20iterable%2C%20default%3DNone">
                vocabulary
                <span class="param-doc-description"
                style="position-anchor: --doc-link-vocabulary;">
                vocabulary: Mapping or iterable, default=None<br><br>Either a Mapping (e.g., a dict) where keys are terms and values are<br>indices in the feature matrix, or an iterable over terms. If not<br>given, a vocabulary is determined from the input documents.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('binary',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-binary;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=binary,-bool%2C%20default%3DFalse">
                binary
                <span class="param-doc-description"
                style="position-anchor: --doc-link-binary;">
                binary: bool, default=False<br><br>If True, all non-zero term counts are set to 1. This does not mean<br>outputs will have only 0/1 values, only that the tf term in tf-idf<br>is binary. (Set `binary` to True, `use_idf` to False and<br>`norm` to None to get 0/1 outputs).</span>
            </a>
        </td>
                <td class="value">False</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('dtype',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-dtype;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=dtype,-dtype%2C%20default%3Dfloat64">
                dtype
                <span class="param-doc-description"
                style="position-anchor: --doc-link-dtype;">
                dtype: dtype, default=float64<br><br>Type of the matrix returned by fit_transform() or transform().</span>
            </a>
        </td>
                <td class="value">&lt;class &#x27;numpy.float64&#x27;&gt;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('norm',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-norm;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=norm,-%7B%27l1%27%2C%20%27l2%27%7D%20or%20None%2C%20default%3D%27l2%27">
                norm
                <span class="param-doc-description"
                style="position-anchor: --doc-link-norm;">
                norm: {&#x27;l1&#x27;, &#x27;l2&#x27;} or None, default=&#x27;l2&#x27;<br><br>Each output row will have unit norm, either:<br><br>- &#x27;l2&#x27;: Sum of squares of vector elements is 1. The cosine<br>  similarity between two vectors is their dot product when l2 norm has<br>  been applied.<br>- &#x27;l1&#x27;: Sum of absolute values of vector elements is 1.<br>  See :func:`~sklearn.preprocessing.normalize`.<br>- None: No normalization.</span>
            </a>
        </td>
                <td class="value">&#x27;l2&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('use_idf',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-use_idf;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=use_idf,-bool%2C%20default%3DTrue">
                use_idf
                <span class="param-doc-description"
                style="position-anchor: --doc-link-use_idf;">
                use_idf: bool, default=True<br><br>Enable inverse-document-frequency reweighting. If False, idf(t) = 1.</span>
            </a>
        </td>
                <td class="value">True</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('smooth_idf',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-smooth_idf;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=smooth_idf,-bool%2C%20default%3DTrue">
                smooth_idf
                <span class="param-doc-description"
                style="position-anchor: --doc-link-smooth_idf;">
                smooth_idf: bool, default=True<br><br>Smooth idf weights by adding one to document frequencies, as if an<br>extra document was seen containing every term in the collection<br>exactly once. Prevents zero divisions.</span>
            </a>
        </td>
                <td class="value">True</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('sublinear_tf',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-sublinear_tf;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=sublinear_tf,-bool%2C%20default%3DFalse">
                sublinear_tf
                <span class="param-doc-description"
                style="position-anchor: --doc-link-sublinear_tf;">
                sublinear_tf: bool, default=False<br><br>Apply sublinear tf scaling, i.e. replace tf with 1 + log(tf).</span>
            </a>
        </td>
                <td class="value">False</td>
            </tr>
    
                      </tbody>
                    </table>
                </details>
            </div>
    
            <div class="estimator-table">
                <details>
                    <summary>Fitted attributes</summary>
                    <table class="parameters-table">
                        <tbody>
                            <tr>
                            <th>Name</th>
                            <th>Type</th>
                            <th>Value</th>
                            </tr>
                        
           <tr class="default">
               <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-fixed_vocabulary_;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=fixed_vocabulary_,-bool">
                fixed_vocabulary_
                <span class="param-doc-description"
                style="position-anchor: --doc-link-fixed_vocabulary_;">
                fixed_vocabulary_: bool<br><br>True if a fixed vocabulary of term to indices mapping<br>is provided by the user.</span>
            </a>
        </td>
               <td class="fitted-att-type">bool</td>
               <td>False</td>


           </tr>
    

           <tr class="default">
               <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-idf_;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=idf_,-array%20of%20shape%20%28n_features%2C%29">
                idf_
                <span class="param-doc-description"
                style="position-anchor: --doc-link-idf_;">
                idf_: array of shape (n_features,)<br><br>The inverse document frequency (IDF) vector; only defined<br>if ``use_idf`` is True.</span>
            </a>
        </td>
               <td class="fitted-att-type">ndarray[float64](34118,)</td>
               <td>[4.73,4.36,7.52,...,7.93,7.93,7.93]</td>


           </tr>
    

           <tr class="default">
               <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-vocabulary_;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=vocabulary_,-dict">
                vocabulary_
                <span class="param-doc-description"
                style="position-anchor: --doc-link-vocabulary_;">
                vocabulary_: dict<br><br>A mapping of terms to feature indices.</span>
            </a>
        </td>
               <td class="fitted-att-type">dict</td>
               <td>{&#x27;00&#x27;: 0, &#x27;000&#x27;: 1, &#x27;0000&#x27;: 2, &#x27;00000&#x27;: 3, ...}</td>


           </tr>
    
                        </tbody>
                    </table>
                </details>
            </div>
        </div></div></div><div class="sk-item"><div class="sk-estimator fitted sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually sk-global" id="sk-estimator-id-3" type="checkbox" ><label for="sk-estimator-id-3" class="sk-toggleable__label fitted sk-toggleable__label-arrow"><div><div>MultinomialNB</div></div><div><a class="sk-estimator-doc-link fitted" rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.MultinomialNB.html">?<span>Documentation for MultinomialNB</span></a></div></label><div class="sk-toggleable__content fitted" data-param-prefix="multinomialnb__">
            <div class="estimator-table">
                <details>
                    <summary>Parameters</summary>
                    <table class="parameters-table">
                      <tbody>
                    
            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('alpha',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-alpha;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.MultinomialNB.html#:~:text=alpha,-float%20or%20array-like%20of%20shape%20%28n_features%2C%29%2C%20default%3D1.0">
                alpha
                <span class="param-doc-description"
                style="position-anchor: --doc-link-alpha;">
                alpha: float or array-like of shape (n_features,), default=1.0<br><br>Additive (Laplace/Lidstone) smoothing parameter<br>(set alpha=0 and force_alpha=True, for no smoothing).</span>
            </a>
        </td>
                <td class="value">1.0</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('force_alpha',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-force_alpha;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.MultinomialNB.html#:~:text=force_alpha,-bool%2C%20default%3DTrue">
                force_alpha
                <span class="param-doc-description"
                style="position-anchor: --doc-link-force_alpha;">
                force_alpha: bool, default=True<br><br>If False and alpha is less than 1e-10, it will set alpha to<br>1e-10. If True, alpha will remain unchanged. This may cause<br>numerical errors if alpha is too close to 0.<br><br>.. versionadded:: 1.2<br>.. versionchanged:: 1.4<br>   The default value of `force_alpha` changed to `True`.</span>
            </a>
        </td>
                <td class="value">True</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('fit_prior',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-fit_prior;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.MultinomialNB.html#:~:text=fit_prior,-bool%2C%20default%3DTrue">
                fit_prior
                <span class="param-doc-description"
                style="position-anchor: --doc-link-fit_prior;">
                fit_prior: bool, default=True<br><br>Whether to learn class prior probabilities or not.<br>If false, a uniform prior will be used.</span>
            </a>
        </td>
                <td class="value">True</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('class_prior',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-class_prior;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.MultinomialNB.html#:~:text=class_prior,-array-like%20of%20shape%20%28n_classes%2C%29%2C%20default%3DNone">
                class_prior
                <span class="param-doc-description"
                style="position-anchor: --doc-link-class_prior;">
                class_prior: array-like of shape (n_classes,), default=None<br><br>Prior probabilities of the classes. If specified, the priors are not<br>adjusted according to the data.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    
                      </tbody>
                    </table>
                </details>
            </div>
    
            <div class="estimator-table">
                <details>
                    <summary>Fitted attributes</summary>
                    <table class="parameters-table">
                        <tbody>
                            <tr>
                            <th>Name</th>
                            <th>Type</th>
                            <th>Value</th>
                            </tr>
                        
           <tr class="default">
               <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-class_count_;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.MultinomialNB.html#:~:text=class_count_,-ndarray%20of%20shape%20%28n_classes%2C%29">
                class_count_
                <span class="param-doc-description"
                style="position-anchor: --doc-link-class_count_;">
                class_count_: ndarray of shape (n_classes,)<br><br>Number of samples encountered for each class during fitting. This<br>value is weighted by the sample weight when provided.</span>
            </a>
        </td>
               <td class="fitted-att-type">ndarray[float64](4,)</td>
               <td>[480.,584.,593.,377.]</td>


           </tr>
    

           <tr class="default">
               <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-class_log_prior_;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.MultinomialNB.html#:~:text=class_log_prior_,-ndarray%20of%20shape%20%28n_classes%2C%29">
                class_log_prior_
                <span class="param-doc-description"
                style="position-anchor: --doc-link-class_log_prior_;">
                class_log_prior_: ndarray of shape (n_classes,)<br><br>Smoothed empirical log probability for each class.</span>
            </a>
        </td>
               <td class="fitted-att-type">ndarray[float64](4,)</td>
               <td>[-1.44,-1.25,-1.23,-1.69]</td>


           </tr>
    

           <tr class="default">
               <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-classes_;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.MultinomialNB.html#:~:text=classes_,-ndarray%20of%20shape%20%28n_classes%2C%29">
                classes_
                <span class="param-doc-description"
                style="position-anchor: --doc-link-classes_;">
                classes_: ndarray of shape (n_classes,)<br><br>Class labels known to the classifier</span>
            </a>
        </td>
               <td class="fitted-att-type">ndarray[int64](4,)</td>
               <td>[0,1,2,3]</td>


           </tr>
    

           <tr class="default">
               <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-feature_count_;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.MultinomialNB.html#:~:text=feature_count_,-ndarray%20of%20shape%20%28n_classes%2C%20n_features%29">
                feature_count_
                <span class="param-doc-description"
                style="position-anchor: --doc-link-feature_count_;">
                feature_count_: ndarray of shape (n_classes, n_features)<br><br>Number of samples encountered for each (class, feature)<br>during fitting. This value is weighted by the sample weight when<br>provided.</span>
            </a>
        </td>
               <td class="fitted-att-type">ndarray[float64](4, 34118)</td>
               <td>[[0.12,1.53,0.  ,...,0.  ,0.  ,0.  ],
     [1.42,0.59,0.  ,...,0.1 ,0.  ,0.  ],
     [1.34,2.78,0.16,...,0.  ,0.  ,0.  ],
     [0.21,0.51,0.  ,...,0.  ,0.12,0.12]]</td>


           </tr>
    

           <tr class="default">
               <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-feature_log_prob_;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.MultinomialNB.html#:~:text=feature_log_prob_,-ndarray%20of%20shape%20%28n_classes%2C%20n_features%29">
                feature_log_prob_
                <span class="param-doc-description"
                style="position-anchor: --doc-link-feature_log_prob_;">
                feature_log_prob_: ndarray of shape (n_classes, n_features)<br><br>Empirical log probability of features<br>given a class, ``P(x_i|y)``.</span>
            </a>
        </td>
               <td class="fitted-att-type">ndarray[float64](4, 34118)</td>
               <td>[[-10.45, -9.64,-10.57,...,-10.57,-10.57,-10.57],
     [ -9.68,-10.1 ,-10.56,...,-10.46,-10.56,-10.56],
     [ -9.74, -9.26,-10.45,...,-10.59,-10.59,-10.59],
     [-10.35,-10.13,-10.54,...,-10.54,-10.43,-10.43]]</td>


           </tr>
    

           <tr class="default">
               <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-n_features_in_;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.MultinomialNB.html#:~:text=n_features_in_,-int">
                n_features_in_
                <span class="param-doc-description"
                style="position-anchor: --doc-link-n_features_in_;">
                n_features_in_: int<br><br>Number of features seen during :term:`fit`.<br><br>.. versionadded:: 0.24</span>
            </a>
        </td>
               <td class="fitted-att-type">int</td>
               <td>34118</td>


           </tr>
    
                        </tbody>
                    </table>
                </details>
            </div>
        </div></div></div></div></div></div></div><script>function copyToClipboard(text, element) {
        // Get the parameter prefix from the closest toggleable content
        const toggleableContent = element.closest('.sk-toggleable__content');
        const paramPrefix = toggleableContent ? toggleableContent.dataset.paramPrefix : '';
        const fullParamName = paramPrefix ? `${paramPrefix}${text}` : text;

        const originalStyle = element.style;
        const computedStyle = window.getComputedStyle(element);
        const originalWidth = computedStyle.width;
        const originalHTML = element.innerHTML.replace('Copied!', '');

        navigator.clipboard.writeText(fullParamName)
            .then(() => {
                element.style.width = originalWidth;
                element.style.color = 'green';
                element.innerHTML = "Copied!";

                setTimeout(() => {
                    element.innerHTML = originalHTML;
                    element.style = originalStyle;
                }, 2000);
            })
            .catch(err => {
                console.error('Failed to copy:', err);
                element.style.color = 'red';
                element.innerHTML = "Failed!";
                setTimeout(() => {
                    element.innerHTML = originalHTML;
                    element.style = originalStyle;
                }, 2000);
            });
        return false;
    }

    document.querySelectorAll('.copy-paste-icon').forEach(function(element) {
        const toggleableContent = element.closest('.sk-toggleable__content');
        const paramPrefix = toggleableContent ? toggleableContent.dataset.paramPrefix : '';
        const paramName = element.parentElement.nextElementSibling
            .textContent.trim().split(' ')[0];
        const fullParamName = paramPrefix ? `${paramPrefix}${paramName}` : paramName;

        element.setAttribute('title', fullParamName);
    });


    /**
     * Adapted from Skrub
     * https://github.com/skrub-data/skrub/blob/403466d1d5d4dc76a7ef569b3f8228db59a31dc3/skrub/_reporting/_data/templates/report.js#L789
     * @returns "light" or "dark"
     */
    function detectTheme(element) {
        const body = document.querySelector('body');

        // Check VSCode theme
        const themeKindAttr = body.getAttribute('data-vscode-theme-kind');
        const themeNameAttr = body.getAttribute('data-vscode-theme-name');

        if (themeKindAttr && themeNameAttr) {
            const themeKind = themeKindAttr.toLowerCase();
            const themeName = themeNameAttr.toLowerCase();

            if (themeKind.includes("dark") || themeName.includes("dark")) {
                return "dark";
            }
            if (themeKind.includes("light") || themeName.includes("light")) {
                return "light";
            }
        }

        // Check Jupyter theme
        if (body.getAttribute('data-jp-theme-light') === 'false') {
            return 'dark';
        } else if (body.getAttribute('data-jp-theme-light') === 'true') {
            return 'light';
        }

        // Guess based on a parent element's color
        const color = window.getComputedStyle(element.parentNode, null).getPropertyValue('color');
        const match = color.match(/^rgb\s*\(\s*(\d+)\s*,\s*(\d+)\s*,\s*(\d+)\s*\)\s*$/i);
        if (match) {
            const [r, g, b] = [
                parseFloat(match[1]),
                parseFloat(match[2]),
                parseFloat(match[3])
            ];

            // https://en.wikipedia.org/wiki/HSL_and_HSV#Lightness
            const luma = 0.299 * r + 0.587 * g + 0.114 * b;

            if (luma > 180) {
                // If the text is very bright we have a dark theme
                return 'dark';
            }
            if (luma < 75) {
                // If the text is very dark we have a light theme
                return 'light';
            }
            // Otherwise fall back to the next heuristic.
        }

        // Fallback to system preference
        return window.matchMedia('(prefers-color-scheme: dark)').matches ? 'dark' : 'light';
    }


    function forceTheme(elementId) {
        const estimatorElement = document.querySelector(`#${elementId}`);
        if (estimatorElement === null) {
            console.error(`Element with id ${elementId} not found.`);
        } else {
            const theme = detectTheme(estimatorElement);
            estimatorElement.classList.add(theme);
        }
    }

    forceTheme('sk-container-id-1');</script></body>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 75-78

.. code-block:: Python

    report = skore.evaluate(model, X_test, y_test, splitter="prefit")
    report.metrics.summarize().frame()






.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th></th>
          <th>MultinomialNB</th>
        </tr>
        <tr>
          <th>Metric</th>
          <th>Label / Average</th>
          <th></th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>Accuracy</th>
          <th></th>
          <td>0.837398</td>
        </tr>
        <tr>
          <th rowspan="4" valign="top">Precision</th>
          <th>0</th>
          <td>0.674888</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0.964865</td>
        </tr>
        <tr>
          <th>2</th>
          <td>0.867117</td>
        </tr>
        <tr>
          <th>3</th>
          <td>0.967742</td>
        </tr>
        <tr>
          <th rowspan="4" valign="top">Recall</th>
          <th>0</th>
          <td>0.943574</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0.917738</td>
        </tr>
        <tr>
          <th>2</th>
          <td>0.977157</td>
        </tr>
        <tr>
          <th>3</th>
          <td>0.358566</td>
        </tr>
        <tr>
          <th rowspan="4" valign="top">ROC AUC</th>
          <th>0</th>
          <td>0.960087</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0.992411</td>
        </tr>
        <tr>
          <th>2</th>
          <td>0.993738</td>
        </tr>
        <tr>
          <th>3</th>
          <td>0.944064</td>
        </tr>
        <tr>
          <th>Log loss</th>
          <th></th>
          <td>0.536984</td>
        </tr>
        <tr>
          <th>Fit time (s)</th>
          <th></th>
          <td>NaN</td>
        </tr>
        <tr>
          <th>Predict time (s)</th>
          <th></th>
          <td>0.288406</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 79-90

Balancing the class before classification
-----------------------------------------

To improve the prediction of the class \#3, it could be interesting to apply
a balancing before to train the naive bayes classifier. Therefore, we will
use a :class:`~imblearn.under_sampling.RandomUnderSampler` to equalize the
number of samples in all the classes before the training.

It is also important to note that we are using the
:class:`~imblearn.pipeline.make_pipeline` function implemented in
imbalanced-learn to properly handle the samplers.

.. GENERATED FROM PYTHON SOURCE LINES 90-93

.. code-block:: Python


    from imblearn.pipeline import make_pipeline as make_pipeline_imb








.. GENERATED FROM PYTHON SOURCE LINES 94-100

.. code-block:: Python

    from imblearn.under_sampling import RandomUnderSampler

    model = make_pipeline_imb(TfidfVectorizer(), RandomUnderSampler(), MultinomialNB())

    model.fit(X_train, y_train)






.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <style>.sk-global {
      /* Definition of color scheme common for light and dark mode */
      --sklearn-color-text: #000;
      --sklearn-color-text-muted: #666;
      --sklearn-color-line: gray;
      /* Definition of color scheme for unfitted estimators */
      --sklearn-color-unfitted-level-0: #fff5e6;
      --sklearn-color-unfitted-level-1: #f6e4d2;
      --sklearn-color-unfitted-level-2: #ffe0b3;
      --sklearn-color-unfitted-level-3: chocolate;
      /* Definition of color scheme for fitted estimators */
      --sklearn-color-fitted-level-0: #f0f8ff;
      --sklearn-color-fitted-level-1: #d4ebff;
      --sklearn-color-fitted-level-2: #b3dbfd;
      --sklearn-color-fitted-level-3: cornflowerblue;
    }

    .sk-global.light {
      /* Specific color for light theme */
      --sklearn-color-text-on-default-background: black;
      --sklearn-color-background: white;
      --sklearn-color-border-box: black;
      --sklearn-color-icon: #696969;
    }

    .sk-global.dark {
      --sklearn-color-text-on-default-background: white;
      --sklearn-color-background: #111;
      --sklearn-color-border-box: white;
      --sklearn-color-icon: #878787;
    }

    .sk-global {
      color: var(--sklearn-color-text);
    }

    .sk-global pre {
      padding: 0;
    }

    .sk-global input.sk-hidden--visually {
      border: 0;
      clip: rect(1px 1px 1px 1px);
      clip: rect(1px, 1px, 1px, 1px);
      height: 1px;
      margin: -1px;
      overflow: hidden;
      padding: 0;
      position: absolute;
      width: 1px;
    }

    .sk-global div.sk-dashed-wrapped {
      border: 1px dashed var(--sklearn-color-line);
      margin: 0 0.4em 0.5em 0.4em;
      box-sizing: border-box;
      padding-bottom: 0.4em;
      background-color: var(--sklearn-color-background);
    }

    .sk-global div.sk-container {
      /* jupyter's `normalize.less` sets `[hidden] { display: none; }`
         but bootstrap.min.css set `[hidden] { display: none !important; }`
         so we also need the `!important` here to be able to override the
         default hidden behavior on the sphinx rendered scikit-learn.org.
         See: https://github.com/scikit-learn/scikit-learn/issues/21755 */
      display: inline-block !important;
      position: relative;
    }

    .sk-global div.sk-text-repr-fallback {
      display: none;
    }

    div.sk-parallel-item,
    div.sk-serial,
    div.sk-item {
      /* draw centered vertical line to link estimators */
      background-image: linear-gradient(var(--sklearn-color-text-on-default-background), var(--sklearn-color-text-on-default-background));
      background-size: 2px 100%;
      background-repeat: no-repeat;
      background-position: center center;
    }

    /* Parallel-specific style estimator block */

    .sk-global div.sk-parallel-item::after {
      content: "";
      width: 100%;
      border-bottom: 2px solid var(--sklearn-color-text-on-default-background);
      flex-grow: 1;
    }

    .sk-global div.sk-parallel {
      display: flex;
      align-items: stretch;
      justify-content: center;
      background-color: var(--sklearn-color-background);
      position: relative;
    }

    .sk-global div.sk-parallel-item {
      display: flex;
      flex-direction: column;
    }

    .sk-global div.sk-parallel-item:first-child::after {
      align-self: flex-end;
      width: 50%;
    }

    .sk-global div.sk-parallel-item:last-child::after {
      align-self: flex-start;
      width: 50%;
    }

    .sk-global div.sk-parallel-item:only-child::after {
      width: 0;
    }

    /* Serial-specific style estimator block */

    .sk-global div.sk-serial {
      display: flex;
      flex-direction: column;
      align-items: center;
      background-color: var(--sklearn-color-background);
      padding-right: 1em;
      padding-left: 1em;
    }


    /* Toggleable style: style used for estimator/Pipeline/ColumnTransformer box that is
    clickable and can be expanded/collapsed.
    - Pipeline and ColumnTransformer use this feature and define the default style
    - Estimators will overwrite some part of the style using the `sk-estimator` class
    */

    /* Pipeline and ColumnTransformer style (default) */

    .sk-global div.sk-toggleable {
      /* Default theme specific background. It is overwritten whether we have a
      specific estimator or a Pipeline/ColumnTransformer */
      background-color: var(--sklearn-color-background);
    }

    /* Toggleable label */
    .sk-global label.sk-toggleable__label {
      cursor: pointer;
      display: flex;
      width: 100%;
      margin-bottom: 0;
      padding: 0.5em;
      box-sizing: border-box;
      text-align: center;
      align-items: center;
      justify-content: center;
      gap: 0.5em;
    }

    .sk-global label.sk-toggleable__label .caption {
      font-size: 0.6rem;
      font-weight: lighter;
      color: var(--sklearn-color-text-muted);
    }

    .sk-global label.sk-toggleable__label-arrow:before {
      /* Arrow on the left of the label */
      content: "▸";
      float: left;
      margin-right: 0.25em;
      color: var(--sklearn-color-icon);
    }

    .sk-global label.sk-toggleable__label-arrow:hover:before {
      color: var(--sklearn-color-text);
    }

    /* Toggleable content - dropdown */

    .sk-global div.sk-toggleable__content {
      display: none;
      text-align: left;
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-0);
    }

    .sk-global div.sk-toggleable__content.fitted {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-0);
    }

    .sk-global div.sk-toggleable__content pre {
      margin: 0.2em;
      border-radius: 0.25em;
      color: var(--sklearn-color-text);
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-0);
    }

    .sk-global div.sk-toggleable__content.fitted pre {
      /* unfitted */
      background-color: var(--sklearn-color-fitted-level-0);
    }

    .sk-global input.sk-toggleable__control:checked~div.sk-toggleable__content {
      /* Expand drop-down */
      display: block;
      width: 100%;
      overflow: visible;
    }

    .sk-global input.sk-toggleable__control:checked~label.sk-toggleable__label-arrow:before {
      content: "▾";
    }

    /* Pipeline/ColumnTransformer-specific style */

    .sk-global div.sk-label input.sk-toggleable__control:checked~label.sk-toggleable__label {
      color: var(--sklearn-color-text);
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    .sk-global div.sk-label.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {
      background-color: var(--sklearn-color-fitted-level-2);
    }

    /* Estimator-specific style */

    /* Colorize estimator box */
    .sk-global div.sk-estimator input.sk-toggleable__control:checked~label.sk-toggleable__label {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    .sk-global div.sk-estimator.fitted input.sk-toggleable__control:checked~label.sk-toggleable__label {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-2);
    }

    .sk-global div.sk-label label.sk-toggleable__label,
    .sk-global div.sk-label label {
      /* The background is the default theme color */
      color: var(--sklearn-color-text-on-default-background);
    }

    /* On hover, darken the color of the background */
    .sk-global div.sk-label:hover label.sk-toggleable__label {
      color: var(--sklearn-color-text);
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    /* Label box, darken color on hover, fitted */
    .sk-global div.sk-label.fitted:hover label.sk-toggleable__label.fitted {
      color: var(--sklearn-color-text);
      background-color: var(--sklearn-color-fitted-level-2);
    }

    /* Estimator label */

    .sk-global div.sk-label label {
      font-family: monospace;
      font-weight: bold;
      line-height: 1.2em;
    }

    .sk-global div.sk-label-container {
      text-align: center;
    }

    /* Estimator-specific */
    .sk-global div.sk-estimator {
      font-family: monospace;
      border: 1px dotted var(--sklearn-color-border-box);
      border-radius: 0.25em;
      box-sizing: border-box;
      margin-bottom: 0.5em;
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-0);
    }

    .sk-global div.sk-estimator.fitted {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-0);
    }

    /* on hover */
    .sk-global div.sk-estimator:hover {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-2);
    }

    .sk-global div.sk-estimator.fitted:hover {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-2);
    }

    /* Specification for estimator info (e.g. "i" and "?") */

    /* Common style for "i" and "?" */

    .sk-estimator-doc-link,
    a:link.sk-estimator-doc-link,
    a:visited.sk-estimator-doc-link {
      float: right;
      font-size: smaller;
      line-height: 1em;
      font-family: monospace;
      background-color: var(--sklearn-color-unfitted-level-0);
      border-radius: 1em;
      height: 1em;
      width: 1em;
      text-decoration: none !important;
      margin-left: 0.5em;
      text-align: center;
      /* unfitted */
      border: var(--sklearn-color-unfitted-level-3) 1pt solid;
      color: var(--sklearn-color-unfitted-level-3);
    }

    .sk-estimator-doc-link.fitted,
    a:link.sk-estimator-doc-link.fitted,
    a:visited.sk-estimator-doc-link.fitted {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-0);
      border: var(--sklearn-color-fitted-level-3) 1pt solid;
      color: var(--sklearn-color-fitted-level-3);
    }

    /* On hover */
    div.sk-estimator:hover .sk-estimator-doc-link:hover,
    .sk-estimator-doc-link:hover,
    div.sk-label-container:hover .sk-estimator-doc-link:hover,
    .sk-estimator-doc-link:hover {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-3);
      border: var(--sklearn-color-fitted-level-0) 1pt solid;
      color: var(--sklearn-color-unfitted-level-0);
      text-decoration: none;
    }

    div.sk-estimator.fitted:hover .sk-estimator-doc-link.fitted:hover,
    .sk-estimator-doc-link.fitted:hover,
    div.sk-label-container:hover .sk-estimator-doc-link.fitted:hover,
    .sk-estimator-doc-link.fitted:hover {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-3);
      border: var(--sklearn-color-fitted-level-0) 1pt solid;
      color: var(--sklearn-color-fitted-level-0);
      text-decoration: none;
    }

    /* Span, style for the box shown on hovering the info icon */
    .sk-estimator-doc-link span {
      display: none;
      z-index: 9999;
      position: relative;
      font-weight: normal;
      right: .2ex;
      padding: .5ex;
      margin: .5ex;
      width: min-content;
      min-width: 20ex;
      max-width: 50ex;
      color: var(--sklearn-color-text);
      box-shadow: 2pt 2pt 4pt #999;
      /* unfitted */
      background: var(--sklearn-color-unfitted-level-0);
      border: .5pt solid var(--sklearn-color-unfitted-level-3);
    }

    .sk-estimator-doc-link.fitted span {
      /* fitted */
      background: var(--sklearn-color-fitted-level-0);
      border: var(--sklearn-color-fitted-level-3);
    }

    .sk-estimator-doc-link:hover span {
      display: block;
    }

    /* "?"-specific style due to the `<a>` HTML tag */

    .sk-global a.estimator_doc_link {
      float: right;
      font-size: 1rem;
      line-height: 1em;
      font-family: monospace;
      background-color: var(--sklearn-color-unfitted-level-0);
      border-radius: 1rem;
      height: 1rem;
      width: 1rem;
      text-decoration: none;
      /* unfitted */
      color: var(--sklearn-color-unfitted-level-1);
      border: var(--sklearn-color-unfitted-level-1) 1pt solid;
    }

    .sk-global a.estimator_doc_link.fitted {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-0);
      border: var(--sklearn-color-fitted-level-1) 1pt solid;
      color: var(--sklearn-color-fitted-level-1);
    }

    /* On hover */
    .sk-global a.estimator_doc_link:hover {
      /* unfitted */
      background-color: var(--sklearn-color-unfitted-level-3);
      color: var(--sklearn-color-background);
      text-decoration: none;
    }

    .sk-global a.estimator_doc_link.fitted:hover {
      /* fitted */
      background-color: var(--sklearn-color-fitted-level-3);
    }

    .estimator-table {
        font-family: monospace;
    }

    .estimator-table summary {
        padding: .5rem;
        cursor: pointer;
    }

    .estimator-table summary::marker {
        font-size: 0.7rem;
    }

    .estimator-table details[open] {
        padding-left: 0.1rem;
        padding-right: 0.1rem;
        padding-bottom: 0.3rem;
    }

    .estimator-table .parameters-table {
        margin-left: auto !important;
        margin-right: auto !important;
        margin-top: 0;
    }

    .estimator-table .parameters-table tr:nth-child(odd) {
        background-color: #fff;
    }

    .estimator-table .parameters-table tr:nth-child(even) {
        background-color: #f6f6f6;
    }

    .estimator-table .parameters-table tr:hover td {
        background-color: #e0e0e0;
    }

    .estimator-table table :is(td, th) {
        border: 1px solid rgba(106, 105, 104, 0.232);
    }

    /*
        `table td`is set in notebook with right text-align.
        We need to overwrite it.
    */
    .estimator-table table td.param {
        text-align: left;
        position: relative;
        padding: 0;
    }

    .user-set td {
        color:rgb(255, 94, 0);
        text-align: left !important;
    }

    .user-set td.value {
        color:rgb(255, 94, 0);
        background-color: transparent;
    }

    .default td, .estimator-table th {
        color: black;
        text-align: left !important;
    }

    .user-set td i,
    .default td i {
        color: black;
    }

    td.fitted-att-type {
        white-space: preserve nowrap;
    }

    /*
        Styles for parameter documentation links
        We need styling for visited so jupyter doesn't overwrite it
    */
    a.param-doc-link,
    a.param-doc-link:link,
    a.param-doc-link:visited {
        text-decoration: underline dashed;
        text-underline-offset: .3em;
        color: inherit;
        display: block;
        padding: .5em;
    }

    @supports(anchor-name: --doc-link) {
        a.param-doc-link,
        a.param-doc-link:link,
        a.param-doc-link:visited {
        anchor-name: --doc-link;
        }
    }

    /* "hack" to make the entire area of the cell containing the link clickable */
    a.param-doc-link::before {
        position: absolute;
        content: "";
        inset: 0;
    }

    .param-doc-description {
        display: none;
        position: absolute;
        z-index: 9999;
        left: 0;
        padding: .5ex;
        margin-left: 1.5em;
        color: var(--sklearn-color-text);
        box-shadow: .3em .3em .4em #999;
        width: max-content;
        text-align: left;
        max-height: 10em;
        overflow-y: auto;

        /* unfitted */
        background: var(--sklearn-color-unfitted-level-0);
        border: thin solid var(--sklearn-color-unfitted-level-3);
    }

    @supports(position-area: center right) {
        .param-doc-description {
        position-area: center right;
        position: fixed;
        margin-left: 0;
        }
    }

    /* Fitted state for parameter tooltips */
    .fitted .param-doc-description {
        /* fitted */
        background: var(--sklearn-color-fitted-level-0);
        border: thin solid var(--sklearn-color-fitted-level-3);
    }

    .param-doc-link:hover .param-doc-description {
        display: block;
    }

    .copy-paste-icon {
        background-image: url(data:image/svg+xml;base64,PHN2ZyB4bWxucz0iaHR0cDovL3d3dy53My5vcmcvMjAwMC9zdmciIHZpZXdCb3g9IjAgMCA0NDggNTEyIj48IS0tIUZvbnQgQXdlc29tZSBGcmVlIDYuNy4yIGJ5IEBmb250YXdlc29tZSAtIGh0dHBzOi8vZm9udGF3ZXNvbWUuY29tIExpY2Vuc2UgLSBodHRwczovL2ZvbnRhd2Vzb21lLmNvbS9saWNlbnNlL2ZyZWUgQ29weXJpZ2h0IDIwMjUgRm9udGljb25zLCBJbmMuLS0+PHBhdGggZD0iTTIwOCAwTDMzMi4xIDBjMTIuNyAwIDI0LjkgNS4xIDMzLjkgMTQuMWw2Ny45IDY3LjljOSA5IDE0LjEgMjEuMiAxNC4xIDMzLjlMNDQ4IDMzNmMwIDI2LjUtMjEuNSA0OC00OCA0OGwtMTkyIDBjLTI2LjUgMC00OC0yMS41LTQ4LTQ4bDAtMjg4YzAtMjYuNSAyMS41LTQ4IDQ4LTQ4ek00OCAxMjhsODAgMCAwIDY0LTY0IDAgMCAyNTYgMTkyIDAgMC0zMiA2NCAwIDAgNDhjMCAyNi41LTIxLjUgNDgtNDggNDhMNDggNTEyYy0yNi41IDAtNDgtMjEuNS00OC00OEwwIDE3NmMwLTI2LjUgMjEuNS00OCA0OC00OHoiLz48L3N2Zz4=);
        background-repeat: no-repeat;
        background-size: 14px 14px;
        background-position: 0;
        display: inline-block;
        width: 14px;
        height: 14px;
        cursor: pointer;
    }
    </style><body><div id="sk-container-id-2" class="sk-top-container sk-global"><div class="sk-text-repr-fallback"><pre>Pipeline(steps=[(&#x27;tfidfvectorizer&#x27;, TfidfVectorizer()),
                    (&#x27;randomundersampler&#x27;, RandomUnderSampler()),
                    (&#x27;multinomialnb&#x27;, MultinomialNB())])</pre><b>In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. <br />On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org.</b></div><div class="sk-container" hidden><div class="sk-item sk-dashed-wrapped"><div class="sk-label-container"><div class="sk-label fitted sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually sk-global" id="sk-estimator-id-4" type="checkbox" ><label for="sk-estimator-id-4" class="sk-toggleable__label fitted sk-toggleable__label-arrow"><div><div>Pipeline</div></div><div><span class="sk-estimator-doc-link fitted">i<span>Fitted</span></span></div></label><div class="sk-toggleable__content fitted" data-param-prefix="">
            <div class="estimator-table">
                <details>
                    <summary>Parameters</summary>
                    <table class="parameters-table">
                      <tbody>
                    
            <tr class="user-set">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('steps',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">steps</td>
                <td class="value">[(&#x27;tfidfvectorizer&#x27;, ...), (&#x27;randomundersampler&#x27;, ...), ...]</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('transform_input',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">transform_input</td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('memory',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">memory</td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('verbose',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">verbose</td>
                <td class="value">False</td>
            </tr>
    
                      </tbody>
                    </table>
                </details>
            </div>
    
            <div class="estimator-table">
                <details>
                    <summary>Fitted attributes</summary>
                    <table class="parameters-table">
                        <tbody>
                            <tr>
                            <th>Name</th>
                            <th>Type</th>
                            <th>Value</th>
                            </tr>
                        
           <tr class="default">
               <td class="param"><a class="param-doc-link" style="text-decoration:none;">classes_</a></td>
               <td class="fitted-att-type">ndarray[int64](4,)</td>
               <td>[0,1,2,3]</td>


           </tr>
    
                        </tbody>
                    </table>
                </details>
            </div>
        </div></div></div><div class="sk-serial"><div class="sk-item"><div class="sk-estimator fitted sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually sk-global" id="sk-estimator-id-5" type="checkbox" ><label for="sk-estimator-id-5" class="sk-toggleable__label fitted sk-toggleable__label-arrow"><div><div>TfidfVectorizer</div></div><div><a class="sk-estimator-doc-link fitted" rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html">?<span>Documentation for TfidfVectorizer</span></a></div></label><div class="sk-toggleable__content fitted" data-param-prefix="tfidfvectorizer__">
            <div class="estimator-table">
                <details>
                    <summary>Parameters</summary>
                    <table class="parameters-table">
                      <tbody>
                    
            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('input',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-input;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=input,-%7B%27filename%27%2C%20%27file%27%2C%20%27content%27%7D%2C%20default%3D%27content%27">
                input
                <span class="param-doc-description"
                style="position-anchor: --doc-link-input;">
                input: {&#x27;filename&#x27;, &#x27;file&#x27;, &#x27;content&#x27;}, default=&#x27;content&#x27;<br><br>- If `&#x27;filename&#x27;`, the sequence passed as an argument to fit is<br>  expected to be a list of filenames that need reading to fetch<br>  the raw content to analyze.<br><br>- If `&#x27;file&#x27;`, the sequence items must have a &#x27;read&#x27; method (file-like<br>  object) that is called to fetch the bytes in memory.<br><br>- If `&#x27;content&#x27;`, the input is expected to be a sequence of items that<br>  can be of type string or byte.</span>
            </a>
        </td>
                <td class="value">&#x27;content&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('encoding',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-encoding;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=encoding,-str%2C%20default%3D%27utf-8%27">
                encoding
                <span class="param-doc-description"
                style="position-anchor: --doc-link-encoding;">
                encoding: str, default=&#x27;utf-8&#x27;<br><br>If bytes or files are given to analyze, this encoding is used to<br>decode.</span>
            </a>
        </td>
                <td class="value">&#x27;utf-8&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('decode_error',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-decode_error;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=decode_error,-%7B%27strict%27%2C%20%27ignore%27%2C%20%27replace%27%7D%2C%20default%3D%27strict%27">
                decode_error
                <span class="param-doc-description"
                style="position-anchor: --doc-link-decode_error;">
                decode_error: {&#x27;strict&#x27;, &#x27;ignore&#x27;, &#x27;replace&#x27;}, default=&#x27;strict&#x27;<br><br>Instruction on what to do if a byte sequence is given to analyze that<br>contains characters not of the given `encoding`. By default, it is<br>&#x27;strict&#x27;, meaning that a UnicodeDecodeError will be raised. Other<br>values are &#x27;ignore&#x27; and &#x27;replace&#x27;.</span>
            </a>
        </td>
                <td class="value">&#x27;strict&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('strip_accents',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-strip_accents;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=strip_accents,-%7B%27ascii%27%2C%20%27unicode%27%7D%20or%20callable%2C%20default%3DNone">
                strip_accents
                <span class="param-doc-description"
                style="position-anchor: --doc-link-strip_accents;">
                strip_accents: {&#x27;ascii&#x27;, &#x27;unicode&#x27;} or callable, default=None<br><br>Remove accents and perform other character normalization<br>during the preprocessing step.<br>&#x27;ascii&#x27; is a fast method that only works on characters that have<br>a direct ASCII mapping.<br>&#x27;unicode&#x27; is a slightly slower method that works on any characters.<br>None (default) means no character normalization is performed.<br><br>Both &#x27;ascii&#x27; and &#x27;unicode&#x27; use NFKD normalization from<br>:func:`unicodedata.normalize`.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('lowercase',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-lowercase;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=lowercase,-bool%2C%20default%3DTrue">
                lowercase
                <span class="param-doc-description"
                style="position-anchor: --doc-link-lowercase;">
                lowercase: bool, default=True<br><br>Convert all characters to lowercase before tokenizing.</span>
            </a>
        </td>
                <td class="value">True</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('preprocessor',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-preprocessor;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=preprocessor,-callable%2C%20default%3DNone">
                preprocessor
                <span class="param-doc-description"
                style="position-anchor: --doc-link-preprocessor;">
                preprocessor: callable, default=None<br><br>Override the preprocessing (string transformation) stage while<br>preserving the tokenizing and n-grams generation steps.<br>Only applies if ``analyzer`` is not callable.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('tokenizer',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-tokenizer;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=tokenizer,-callable%2C%20default%3DNone">
                tokenizer
                <span class="param-doc-description"
                style="position-anchor: --doc-link-tokenizer;">
                tokenizer: callable, default=None<br><br>Override the string tokenization step while preserving the<br>preprocessing and n-grams generation steps.<br>Only applies if ``analyzer == &#x27;word&#x27;``.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('analyzer',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-analyzer;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=analyzer,-%7B%27word%27%2C%20%27char%27%2C%20%27char_wb%27%7D%20or%20callable%2C%20default%3D%27word%27">
                analyzer
                <span class="param-doc-description"
                style="position-anchor: --doc-link-analyzer;">
                analyzer: {&#x27;word&#x27;, &#x27;char&#x27;, &#x27;char_wb&#x27;} or callable, default=&#x27;word&#x27;<br><br>Whether the feature should be made of word or character n-grams.<br>Option &#x27;char_wb&#x27; creates character n-grams only from text inside<br>word boundaries; n-grams at the edges of words are padded with space.<br><br>If a callable is passed it is used to extract the sequence of features<br>out of the raw, unprocessed input.<br><br>.. versionchanged:: 0.21<br>    Since v0.21, if ``input`` is ``&#x27;filename&#x27;`` or ``&#x27;file&#x27;``, the data<br>    is first read from the file and then passed to the given callable<br>    analyzer.</span>
            </a>
        </td>
                <td class="value">&#x27;word&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('stop_words',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-stop_words;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=stop_words,-%7B%27english%27%7D%2C%20list%2C%20default%3DNone">
                stop_words
                <span class="param-doc-description"
                style="position-anchor: --doc-link-stop_words;">
                stop_words: {&#x27;english&#x27;}, list, default=None<br><br>If a string, it is passed to _check_stop_list and the appropriate stop<br>list is returned. &#x27;english&#x27; is currently the only supported string<br>value.<br>There are several known issues with &#x27;english&#x27; and you should<br>consider an alternative (see :ref:`stop_words`).<br><br>If a list, that list is assumed to contain stop words, all of which<br>will be removed from the resulting tokens.<br>Only applies if ``analyzer == &#x27;word&#x27;``.<br><br>If None, no stop words will be used. In this case, setting `max_df`<br>to a higher value, such as in the range (0.7, 1.0), can automatically detect<br>and filter stop words based on intra corpus document frequency of terms.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('token_pattern',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-token_pattern;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=token_pattern,-str%2C%20default%3Dr%22%28%3Fu%29%5C%5Cb%5C%5Cw%5C%5Cw%2B%5C%5Cb%22">
                token_pattern
                <span class="param-doc-description"
                style="position-anchor: --doc-link-token_pattern;">
                token_pattern: str, default=r&quot;(?u)\\b\\w\\w+\\b&quot;<br><br>Regular expression denoting what constitutes a &quot;token&quot;, only used<br>if ``analyzer == &#x27;word&#x27;``. The default regexp selects tokens of 2<br>or more alphanumeric characters (punctuation is completely ignored<br>and always treated as a token separator).<br><br>If there is a capturing group in token_pattern then the<br>captured group content, not the entire match, becomes the token.<br>At most one capturing group is permitted.</span>
            </a>
        </td>
                <td class="value">&#x27;(?u)\\b\\w\\w+\\b&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('ngram_range',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-ngram_range;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=ngram_range,-tuple%20%28min_n%2C%20max_n%29%2C%20default%3D%281%2C%201%29">
                ngram_range
                <span class="param-doc-description"
                style="position-anchor: --doc-link-ngram_range;">
                ngram_range: tuple (min_n, max_n), default=(1, 1)<br><br>The lower and upper boundary of the range of n-values for different<br>n-grams to be extracted. All values of n such that min_n &lt;= n &lt;= max_n<br>will be used. For example an ``ngram_range`` of ``(1, 1)`` means only<br>unigrams, ``(1, 2)`` means unigrams and bigrams, and ``(2, 2)`` means<br>only bigrams.<br>Only applies if ``analyzer`` is not callable.</span>
            </a>
        </td>
                <td class="value">(1, ...)</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('max_df',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-max_df;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=max_df,-float%20or%20int%2C%20default%3D1.0">
                max_df
                <span class="param-doc-description"
                style="position-anchor: --doc-link-max_df;">
                max_df: float or int, default=1.0<br><br>When building the vocabulary ignore terms that have a document<br>frequency strictly higher than the given threshold (corpus-specific<br>stop words).<br>If float in range [0.0, 1.0], the parameter represents a proportion of<br>documents, integer absolute counts.<br>This parameter is ignored if vocabulary is not None.</span>
            </a>
        </td>
                <td class="value">1.0</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('min_df',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-min_df;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=min_df,-float%20or%20int%2C%20default%3D1">
                min_df
                <span class="param-doc-description"
                style="position-anchor: --doc-link-min_df;">
                min_df: float or int, default=1<br><br>When building the vocabulary ignore terms that have a document<br>frequency strictly lower than the given threshold. This value is also<br>called cut-off in the literature.<br>If float in range of [0.0, 1.0], the parameter represents a proportion<br>of documents, integer absolute counts.<br>This parameter is ignored if vocabulary is not None.</span>
            </a>
        </td>
                <td class="value">1</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('max_features',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-max_features;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=max_features,-int%2C%20default%3DNone">
                max_features
                <span class="param-doc-description"
                style="position-anchor: --doc-link-max_features;">
                max_features: int, default=None<br><br>If not None, build a vocabulary that only consider the top<br>`max_features` ordered by term frequency across the corpus.<br>Otherwise, all features are used.<br><br>This parameter is ignored if vocabulary is not None.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('vocabulary',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-vocabulary;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=vocabulary,-Mapping%20or%20iterable%2C%20default%3DNone">
                vocabulary
                <span class="param-doc-description"
                style="position-anchor: --doc-link-vocabulary;">
                vocabulary: Mapping or iterable, default=None<br><br>Either a Mapping (e.g., a dict) where keys are terms and values are<br>indices in the feature matrix, or an iterable over terms. If not<br>given, a vocabulary is determined from the input documents.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('binary',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-binary;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=binary,-bool%2C%20default%3DFalse">
                binary
                <span class="param-doc-description"
                style="position-anchor: --doc-link-binary;">
                binary: bool, default=False<br><br>If True, all non-zero term counts are set to 1. This does not mean<br>outputs will have only 0/1 values, only that the tf term in tf-idf<br>is binary. (Set `binary` to True, `use_idf` to False and<br>`norm` to None to get 0/1 outputs).</span>
            </a>
        </td>
                <td class="value">False</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('dtype',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-dtype;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=dtype,-dtype%2C%20default%3Dfloat64">
                dtype
                <span class="param-doc-description"
                style="position-anchor: --doc-link-dtype;">
                dtype: dtype, default=float64<br><br>Type of the matrix returned by fit_transform() or transform().</span>
            </a>
        </td>
                <td class="value">&lt;class &#x27;numpy.float64&#x27;&gt;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('norm',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-norm;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=norm,-%7B%27l1%27%2C%20%27l2%27%7D%20or%20None%2C%20default%3D%27l2%27">
                norm
                <span class="param-doc-description"
                style="position-anchor: --doc-link-norm;">
                norm: {&#x27;l1&#x27;, &#x27;l2&#x27;} or None, default=&#x27;l2&#x27;<br><br>Each output row will have unit norm, either:<br><br>- &#x27;l2&#x27;: Sum of squares of vector elements is 1. The cosine<br>  similarity between two vectors is their dot product when l2 norm has<br>  been applied.<br>- &#x27;l1&#x27;: Sum of absolute values of vector elements is 1.<br>  See :func:`~sklearn.preprocessing.normalize`.<br>- None: No normalization.</span>
            </a>
        </td>
                <td class="value">&#x27;l2&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('use_idf',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-use_idf;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=use_idf,-bool%2C%20default%3DTrue">
                use_idf
                <span class="param-doc-description"
                style="position-anchor: --doc-link-use_idf;">
                use_idf: bool, default=True<br><br>Enable inverse-document-frequency reweighting. If False, idf(t) = 1.</span>
            </a>
        </td>
                <td class="value">True</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('smooth_idf',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-smooth_idf;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=smooth_idf,-bool%2C%20default%3DTrue">
                smooth_idf
                <span class="param-doc-description"
                style="position-anchor: --doc-link-smooth_idf;">
                smooth_idf: bool, default=True<br><br>Smooth idf weights by adding one to document frequencies, as if an<br>extra document was seen containing every term in the collection<br>exactly once. Prevents zero divisions.</span>
            </a>
        </td>
                <td class="value">True</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('sublinear_tf',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-sublinear_tf;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=sublinear_tf,-bool%2C%20default%3DFalse">
                sublinear_tf
                <span class="param-doc-description"
                style="position-anchor: --doc-link-sublinear_tf;">
                sublinear_tf: bool, default=False<br><br>Apply sublinear tf scaling, i.e. replace tf with 1 + log(tf).</span>
            </a>
        </td>
                <td class="value">False</td>
            </tr>
    
                      </tbody>
                    </table>
                </details>
            </div>
    
            <div class="estimator-table">
                <details>
                    <summary>Fitted attributes</summary>
                    <table class="parameters-table">
                        <tbody>
                            <tr>
                            <th>Name</th>
                            <th>Type</th>
                            <th>Value</th>
                            </tr>
                        
           <tr class="default">
               <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-fixed_vocabulary_;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=fixed_vocabulary_,-bool">
                fixed_vocabulary_
                <span class="param-doc-description"
                style="position-anchor: --doc-link-fixed_vocabulary_;">
                fixed_vocabulary_: bool<br><br>True if a fixed vocabulary of term to indices mapping<br>is provided by the user.</span>
            </a>
        </td>
               <td class="fitted-att-type">bool</td>
               <td>False</td>


           </tr>
    

           <tr class="default">
               <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-idf_;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=idf_,-array%20of%20shape%20%28n_features%2C%29">
                idf_
                <span class="param-doc-description"
                style="position-anchor: --doc-link-idf_;">
                idf_: array of shape (n_features,)<br><br>The inverse document frequency (IDF) vector; only defined<br>if ``use_idf`` is True.</span>
            </a>
        </td>
               <td class="fitted-att-type">ndarray[float64](34118,)</td>
               <td>[4.73,4.36,7.52,...,7.93,7.93,7.93]</td>


           </tr>
    

           <tr class="default">
               <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-vocabulary_;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.feature_extraction.text.TfidfVectorizer.html#:~:text=vocabulary_,-dict">
                vocabulary_
                <span class="param-doc-description"
                style="position-anchor: --doc-link-vocabulary_;">
                vocabulary_: dict<br><br>A mapping of terms to feature indices.</span>
            </a>
        </td>
               <td class="fitted-att-type">dict</td>
               <td>{&#x27;00&#x27;: 0, &#x27;000&#x27;: 1, &#x27;0000&#x27;: 2, &#x27;00000&#x27;: 3, ...}</td>


           </tr>
    
                        </tbody>
                    </table>
                </details>
            </div>
        </div></div></div><div class="sk-item"><div class="sk-estimator fitted sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually sk-global" id="sk-estimator-id-6" type="checkbox" ><label for="sk-estimator-id-6" class="sk-toggleable__label fitted sk-toggleable__label-arrow"><div><div>RandomUnderSampler</div></div></label><div class="sk-toggleable__content fitted" data-param-prefix="randomundersampler__">
            <div class="estimator-table">
                <details>
                    <summary>Parameters</summary>
                    <table class="parameters-table">
                      <tbody>
                    
            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('sampling_strategy',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">sampling_strategy</td>
                <td class="value">&#x27;auto&#x27;</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('random_state',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">random_state</td>
                <td class="value">None</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('replacement',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">replacement</td>
                <td class="value">False</td>
            </tr>
    
                      </tbody>
                    </table>
                </details>
            </div>
    
            <div class="estimator-table">
                <details>
                    <summary>Fitted attributes</summary>
                    <table class="parameters-table">
                        <tbody>
                            <tr>
                            <th>Name</th>
                            <th>Type</th>
                            <th>Value</th>
                            </tr>
                        
           <tr class="default">
               <td class="param"><a class="param-doc-link" style="text-decoration:none;">n_features_in_</a></td>
               <td class="fitted-att-type">int</td>
               <td>34118</td>


           </tr>
    

           <tr class="default">
               <td class="param"><a class="param-doc-link" style="text-decoration:none;">sample_indices_</a></td>
               <td class="fitted-att-type">ndarray[int64](1508,)</td>
               <td>[1360,1543, 493,...,2011,2012,2014]</td>


           </tr>
    

           <tr class="default">
               <td class="param"><a class="param-doc-link" style="text-decoration:none;">sampling_strategy_</a></td>
               <td class="fitted-att-type">OrderedDict</td>
               <td>OrderedDict({...p.int64(377)})</td>


           </tr>
    
                        </tbody>
                    </table>
                </details>
            </div>
        </div></div></div><div class="sk-item"><div class="sk-estimator fitted sk-toggleable"><input class="sk-toggleable__control sk-hidden--visually sk-global" id="sk-estimator-id-7" type="checkbox" ><label for="sk-estimator-id-7" class="sk-toggleable__label fitted sk-toggleable__label-arrow"><div><div>MultinomialNB</div></div><div><a class="sk-estimator-doc-link fitted" rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.MultinomialNB.html">?<span>Documentation for MultinomialNB</span></a></div></label><div class="sk-toggleable__content fitted" data-param-prefix="multinomialnb__">
            <div class="estimator-table">
                <details>
                    <summary>Parameters</summary>
                    <table class="parameters-table">
                      <tbody>
                    
            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('alpha',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-alpha;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.MultinomialNB.html#:~:text=alpha,-float%20or%20array-like%20of%20shape%20%28n_features%2C%29%2C%20default%3D1.0">
                alpha
                <span class="param-doc-description"
                style="position-anchor: --doc-link-alpha;">
                alpha: float or array-like of shape (n_features,), default=1.0<br><br>Additive (Laplace/Lidstone) smoothing parameter<br>(set alpha=0 and force_alpha=True, for no smoothing).</span>
            </a>
        </td>
                <td class="value">1.0</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('force_alpha',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-force_alpha;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.MultinomialNB.html#:~:text=force_alpha,-bool%2C%20default%3DTrue">
                force_alpha
                <span class="param-doc-description"
                style="position-anchor: --doc-link-force_alpha;">
                force_alpha: bool, default=True<br><br>If False and alpha is less than 1e-10, it will set alpha to<br>1e-10. If True, alpha will remain unchanged. This may cause<br>numerical errors if alpha is too close to 0.<br><br>.. versionadded:: 1.2<br>.. versionchanged:: 1.4<br>   The default value of `force_alpha` changed to `True`.</span>
            </a>
        </td>
                <td class="value">True</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('fit_prior',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-fit_prior;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.MultinomialNB.html#:~:text=fit_prior,-bool%2C%20default%3DTrue">
                fit_prior
                <span class="param-doc-description"
                style="position-anchor: --doc-link-fit_prior;">
                fit_prior: bool, default=True<br><br>Whether to learn class prior probabilities or not.<br>If false, a uniform prior will be used.</span>
            </a>
        </td>
                <td class="value">True</td>
            </tr>
    

            <tr class="default">
                <td><i class="copy-paste-icon"
                     onclick="copyToClipboard('class_prior',
                              this.parentElement.nextElementSibling)"
                ></i></td>
                <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-class_prior;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.MultinomialNB.html#:~:text=class_prior,-array-like%20of%20shape%20%28n_classes%2C%29%2C%20default%3DNone">
                class_prior
                <span class="param-doc-description"
                style="position-anchor: --doc-link-class_prior;">
                class_prior: array-like of shape (n_classes,), default=None<br><br>Prior probabilities of the classes. If specified, the priors are not<br>adjusted according to the data.</span>
            </a>
        </td>
                <td class="value">None</td>
            </tr>
    
                      </tbody>
                    </table>
                </details>
            </div>
    
            <div class="estimator-table">
                <details>
                    <summary>Fitted attributes</summary>
                    <table class="parameters-table">
                        <tbody>
                            <tr>
                            <th>Name</th>
                            <th>Type</th>
                            <th>Value</th>
                            </tr>
                        
           <tr class="default">
               <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-class_count_;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.MultinomialNB.html#:~:text=class_count_,-ndarray%20of%20shape%20%28n_classes%2C%29">
                class_count_
                <span class="param-doc-description"
                style="position-anchor: --doc-link-class_count_;">
                class_count_: ndarray of shape (n_classes,)<br><br>Number of samples encountered for each class during fitting. This<br>value is weighted by the sample weight when provided.</span>
            </a>
        </td>
               <td class="fitted-att-type">ndarray[float64](4,)</td>
               <td>[377.,377.,377.,377.]</td>


           </tr>
    

           <tr class="default">
               <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-class_log_prior_;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.MultinomialNB.html#:~:text=class_log_prior_,-ndarray%20of%20shape%20%28n_classes%2C%29">
                class_log_prior_
                <span class="param-doc-description"
                style="position-anchor: --doc-link-class_log_prior_;">
                class_log_prior_: ndarray of shape (n_classes,)<br><br>Smoothed empirical log probability for each class.</span>
            </a>
        </td>
               <td class="fitted-att-type">ndarray[float64](4,)</td>
               <td>[-1.39,-1.39,-1.39,-1.39]</td>


           </tr>
    

           <tr class="default">
               <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-classes_;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.MultinomialNB.html#:~:text=classes_,-ndarray%20of%20shape%20%28n_classes%2C%29">
                classes_
                <span class="param-doc-description"
                style="position-anchor: --doc-link-classes_;">
                classes_: ndarray of shape (n_classes,)<br><br>Class labels known to the classifier</span>
            </a>
        </td>
               <td class="fitted-att-type">ndarray[int64](4,)</td>
               <td>[0,1,2,3]</td>


           </tr>
    

           <tr class="default">
               <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-feature_count_;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.MultinomialNB.html#:~:text=feature_count_,-ndarray%20of%20shape%20%28n_classes%2C%20n_features%29">
                feature_count_
                <span class="param-doc-description"
                style="position-anchor: --doc-link-feature_count_;">
                feature_count_: ndarray of shape (n_classes, n_features)<br><br>Number of samples encountered for each (class, feature)<br>during fitting. This value is weighted by the sample weight when<br>provided.</span>
            </a>
        </td>
               <td class="fitted-att-type">ndarray[float64](4, 34118)</td>
               <td>[[0.12,0.92,0.  ,...,0.  ,0.  ,0.  ],
     [0.99,0.29,0.  ,...,0.1 ,0.  ,0.  ],
     [0.89,1.75,0.08,...,0.  ,0.  ,0.  ],
     [0.21,0.51,0.  ,...,0.  ,0.12,0.12]]</td>


           </tr>
    

           <tr class="default">
               <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-feature_log_prob_;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.MultinomialNB.html#:~:text=feature_log_prob_,-ndarray%20of%20shape%20%28n_classes%2C%20n_features%29">
                feature_log_prob_
                <span class="param-doc-description"
                style="position-anchor: --doc-link-feature_log_prob_;">
                feature_log_prob_: ndarray of shape (n_classes, n_features)<br><br>Empirical log probability of features<br>given a class, ``P(x_i|y)``.</span>
            </a>
        </td>
               <td class="fitted-att-type">ndarray[float64](4, 34118)</td>
               <td>[[-10.43, -9.89,-10.54,...,-10.54,-10.54,-10.54],
     [ -9.83,-10.26,-10.52,...,-10.42,-10.52,-10.52],
     [ -9.9 , -9.53,-10.46,...,-10.54,-10.54,-10.54],
     [-10.35,-10.13,-10.54,...,-10.54,-10.43,-10.43]]</td>


           </tr>
    

           <tr class="default">
               <td class="param">
            <a class="param-doc-link"
                style="anchor-name: --doc-link-n_features_in_;"
                rel="noreferrer" target="_blank" href="https://scikit-learn.org/dev/modules/generated/sklearn.naive_bayes.MultinomialNB.html#:~:text=n_features_in_,-int">
                n_features_in_
                <span class="param-doc-description"
                style="position-anchor: --doc-link-n_features_in_;">
                n_features_in_: int<br><br>Number of features seen during :term:`fit`.<br><br>.. versionadded:: 0.24</span>
            </a>
        </td>
               <td class="fitted-att-type">int</td>
               <td>34118</td>


           </tr>
    
                        </tbody>
                    </table>
                </details>
            </div>
        </div></div></div></div></div></div></div><script>function copyToClipboard(text, element) {
        // Get the parameter prefix from the closest toggleable content
        const toggleableContent = element.closest('.sk-toggleable__content');
        const paramPrefix = toggleableContent ? toggleableContent.dataset.paramPrefix : '';
        const fullParamName = paramPrefix ? `${paramPrefix}${text}` : text;

        const originalStyle = element.style;
        const computedStyle = window.getComputedStyle(element);
        const originalWidth = computedStyle.width;
        const originalHTML = element.innerHTML.replace('Copied!', '');

        navigator.clipboard.writeText(fullParamName)
            .then(() => {
                element.style.width = originalWidth;
                element.style.color = 'green';
                element.innerHTML = "Copied!";

                setTimeout(() => {
                    element.innerHTML = originalHTML;
                    element.style = originalStyle;
                }, 2000);
            })
            .catch(err => {
                console.error('Failed to copy:', err);
                element.style.color = 'red';
                element.innerHTML = "Failed!";
                setTimeout(() => {
                    element.innerHTML = originalHTML;
                    element.style = originalStyle;
                }, 2000);
            });
        return false;
    }

    document.querySelectorAll('.copy-paste-icon').forEach(function(element) {
        const toggleableContent = element.closest('.sk-toggleable__content');
        const paramPrefix = toggleableContent ? toggleableContent.dataset.paramPrefix : '';
        const paramName = element.parentElement.nextElementSibling
            .textContent.trim().split(' ')[0];
        const fullParamName = paramPrefix ? `${paramPrefix}${paramName}` : paramName;

        element.setAttribute('title', fullParamName);
    });


    /**
     * Adapted from Skrub
     * https://github.com/skrub-data/skrub/blob/403466d1d5d4dc76a7ef569b3f8228db59a31dc3/skrub/_reporting/_data/templates/report.js#L789
     * @returns "light" or "dark"
     */
    function detectTheme(element) {
        const body = document.querySelector('body');

        // Check VSCode theme
        const themeKindAttr = body.getAttribute('data-vscode-theme-kind');
        const themeNameAttr = body.getAttribute('data-vscode-theme-name');

        if (themeKindAttr && themeNameAttr) {
            const themeKind = themeKindAttr.toLowerCase();
            const themeName = themeNameAttr.toLowerCase();

            if (themeKind.includes("dark") || themeName.includes("dark")) {
                return "dark";
            }
            if (themeKind.includes("light") || themeName.includes("light")) {
                return "light";
            }
        }

        // Check Jupyter theme
        if (body.getAttribute('data-jp-theme-light') === 'false') {
            return 'dark';
        } else if (body.getAttribute('data-jp-theme-light') === 'true') {
            return 'light';
        }

        // Guess based on a parent element's color
        const color = window.getComputedStyle(element.parentNode, null).getPropertyValue('color');
        const match = color.match(/^rgb\s*\(\s*(\d+)\s*,\s*(\d+)\s*,\s*(\d+)\s*\)\s*$/i);
        if (match) {
            const [r, g, b] = [
                parseFloat(match[1]),
                parseFloat(match[2]),
                parseFloat(match[3])
            ];

            // https://en.wikipedia.org/wiki/HSL_and_HSV#Lightness
            const luma = 0.299 * r + 0.587 * g + 0.114 * b;

            if (luma > 180) {
                // If the text is very bright we have a dark theme
                return 'dark';
            }
            if (luma < 75) {
                // If the text is very dark we have a light theme
                return 'light';
            }
            // Otherwise fall back to the next heuristic.
        }

        // Fallback to system preference
        return window.matchMedia('(prefers-color-scheme: dark)').matches ? 'dark' : 'light';
    }


    function forceTheme(elementId) {
        const estimatorElement = document.querySelector(`#${elementId}`);
        if (estimatorElement === null) {
            console.error(`Element with id ${elementId} not found.`);
        } else {
            const theme = detectTheme(estimatorElement);
            estimatorElement.classList.add(theme);
        }
    }

    forceTheme('sk-container-id-2');</script></body>
    </div>
    <br />
    <br />

.. GENERATED FROM PYTHON SOURCE LINES 101-105

Although the results are almost identical, it can be seen that the resampling
allowed to correct the poor recall of the class \#3 at the cost of reducing
the other metrics for the other classes. However, the overall results are
slightly better.

.. GENERATED FROM PYTHON SOURCE LINES 107-109

.. code-block:: Python

    report = skore.evaluate(model, X_test, y_test, splitter="prefit")
    report.metrics.summarize().frame()





.. raw:: html

    <div class="output_subarea output_html rendered_html output_result">
    <div>
    <style scoped>
        .dataframe tbody tr th:only-of-type {
            vertical-align: middle;
        }

        .dataframe tbody tr th {
            vertical-align: top;
        }

        .dataframe thead th {
            text-align: right;
        }
    </style>
    <table border="1" class="dataframe">
      <thead>
        <tr style="text-align: right;">
          <th></th>
          <th></th>
          <th>MultinomialNB</th>
        </tr>
        <tr>
          <th>Metric</th>
          <th>Label / Average</th>
          <th></th>
        </tr>
      </thead>
      <tbody>
        <tr>
          <th>Accuracy</th>
          <th></th>
          <td>0.850702</td>
        </tr>
        <tr>
          <th rowspan="4" valign="top">Precision</th>
          <th>0</th>
          <td>0.697561</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0.979351</td>
        </tr>
        <tr>
          <th>2</th>
          <td>0.945946</td>
        </tr>
        <tr>
          <th>3</th>
          <td>0.782051</td>
        </tr>
        <tr>
          <th rowspan="4" valign="top">Recall</th>
          <th>0</th>
          <td>0.896552</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0.853470</td>
        </tr>
        <tr>
          <th>2</th>
          <td>0.888325</td>
        </tr>
        <tr>
          <th>3</th>
          <td>0.729084</td>
        </tr>
        <tr>
          <th rowspan="4" valign="top">ROC AUC</th>
          <th>0</th>
          <td>0.962858</td>
        </tr>
        <tr>
          <th>1</th>
          <td>0.987576</td>
        </tr>
        <tr>
          <th>2</th>
          <td>0.989366</td>
        </tr>
        <tr>
          <th>3</th>
          <td>0.938616</td>
        </tr>
        <tr>
          <th>Log loss</th>
          <th></th>
          <td>0.629846</td>
        </tr>
        <tr>
          <th>Fit time (s)</th>
          <th></th>
          <td>NaN</td>
        </tr>
        <tr>
          <th>Predict time (s)</th>
          <th></th>
          <td>0.273553</td>
        </tr>
      </tbody>
    </table>
    </div>
    </div>
    <br />
    <br />


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** (0 minutes 24.071 seconds)

**Estimated memory usage:**  1033 MB


.. _sphx_glr_download_auto_examples_applications_plot_topic_classication.py:

.. only:: html

  .. container:: sphx-glr-footer sphx-glr-footer-example

    .. container:: sphx-glr-download sphx-glr-download-jupyter

      :download:`Download Jupyter notebook: plot_topic_classication.ipynb <plot_topic_classication.ipynb>`

    .. container:: sphx-glr-download sphx-glr-download-python

      :download:`Download Python source code: plot_topic_classication.py <plot_topic_classication.py>`

    .. container:: sphx-glr-download sphx-glr-download-zip

      :download:`Download zipped: plot_topic_classication.zip <plot_topic_classication.zip>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
