Package statkit

Statistics for machine learning.

Brings traditional (frequentistic) statistical concepts to your sci-kit learn models.

Examples

  • Hypothesis testing of model scores with p-values (see, e.g., unpaired_permutation_test()),
  • Estimate 95 % confidence intervals around test scores (see, e.g., bootstrap_score()).
  • Decision curve analysis to compare models in terms of consequences of actions (see, e.g., NetBenefitDisplay).
  • Downsample a dataset while matching/stratifying on continuous/discrete variables to balance the groups (see, e.g., balanced_downsample()).
  • Univariate feature selection with multiple hypothesis testing correction (see, e.g., StatisticalTestFilter),

Installation

You can install statkit via pip from PyPI:

pip3 install statkit
Expand source code
r"""Statistics for machine learning.

Brings traditional (frequentistic) statistical concepts to your sci-kit learn models.
Examples:
    - Hypothesis testing of model scores with \(p\)-values (see, e.g.,
        `statkit.non_parametric.unpaired_permutation_test`),
    - Estimate 95 % confidence intervals around test scores (see, e.g.,
        `statkit.non_parametric.bootstrap_score`).
    - Decision curve analysis to compare models in terms of consequences of actions
        (see, e.g., `statkit.decision.NetBenefitDisplay`).
    - Downsample a dataset while matching/stratifying on continuous/discrete variables
      to balance the groups (see, e.g., `statkit.dataset.balanced_downsample`).
    - Univariate feature selection with multiple hypothesis testing correction (see,
      e.g.,
        `statkit.feature_selection.StatisticalTestFilter`),

Installation:
  You can install `statkit` via pip from [PyPI](https://pypi.org/project/statkit/):
  ```bash
  pip3 install statkit
  ```
"""

__version__ = "1.0.0"

Sub-modules

statkit.dataset

Various methods for partitioning the dataset, such as downsampling and splitting.

statkit.decision

Evaluate models using decision curve analysis.

statkit.feature_selection

Select features using statistical hypothesis testing.

statkit.metrics

Classification metrics not part of sci-kit learn.

statkit.non_parametric

Confidence intervals and p-values of a model's (test) score …

statkit.power

Estimate population size needed to reject null hypothesis for a given metric.

statkit.types
statkit.views