Metadata-Version: 2.4
Name: utilsds-models
Version: 0.0.5
Summary: Solution for specific models
Author-email: DS Team <ds@sts.pl>
License: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.12
Description-Content-Type: text/markdown
Requires-Dist: pandas>=2.2.2
Requires-Dist: numpy>=1.26.0
Requires-Dist: scikit-learn>=1.5.0
Requires-Dist: matplotlib>=3.9.0
Provides-Extra: dev
Requires-Dist: pre-commit>=3.5.0; extra == "dev"

# utilsds-models

A library of classes and functions used in DS Team modeling projects.
It extends the [utilsds](https://pypi.org/project/utilsds/) package with components specific to selected models
(data processing, NGR metrics, EVIP metrics, and a custom LightGBM objective).

Requires Python >= 3.12.

## Installation

```bash
uv sync
source .venv/bin/activate
```

Or from PyPI (after publication):

```bash
pip install utilsds-models
```

## Modules

### `data_processing`

Scikit-learn-compatible classes and helper functions for combining test results.

- **`ColumnCopyImputer`**: fills missing values by copying from other columns.
- **`NullImputerWithFlags`**: imputes nulls with optional flags (`_isnull_flag`) and strategies `mean`, `median`, `mode`.
- **`MaxMultiplierImputer`**: imputes using `max * multiplier` from training data, with optional flags.
- **`SportsHybridEncoder`**: encodes sports as binary features for the top N disciplines plus aggregations for the rest.
- **`LabelEncoderTransformer`**: label encoding for categorical columns (e.g. for LightGBM).
- **`DerivedFeatureCreator`**: creates derived features (e.g. division by 7 or 30).
- **`combine_test_data`**: combines test features, target, predictions, and metadata into a single DataFrame.

### `custom_metrics`

Evaluation metrics with time-based weights (`days_since_ftd`) and a custom LightGBM objective.

- **`EvalMetric`**: cohort-level (aggregated by `days_since_ftd`) and sample-level metrics with time-decay weighting:
  - `cohort_weighted_mae`, `cohort_weighted_mse`, `cohort_weighted_mape`
  - `sample_weighted_mae`, `sample_weighted_mse`, `sample_weighted_mape`
  - `create_lgb_metric`: factory for LightGBM metrics with time weights
- **`DaysWeightedObjective`**: custom LightGBM objective with time weights (modes: `mae`, `mse`, `mape`).

### `metrics`

- **`calculate_ngr_metrics`**: computes NGR error metrics (MAE, MAPE, ME, MPE) in both standard
  and business-optimal variants with weights based on `days_since_ftd`. Optionally applies late-stage
  prediction correction over the customer lifecycle.

### `visualization`

- **`calculate_ngr_metrics`**: NGR evaluation function (equivalent to the `metrics` module).

### `evip_dynamic`

Classification metrics with a false-positive (FP) budget constraint.

- **`recall_with_fp_cap`**: recall with a penalty for exceeding the FP budget in binary classification.
- **`weighted_premium_recall_with_fp_cap`**: weighted recall for premium classes (1 and 2) with a penalty
  for exceeding the FPR budget among class 0 samples.

## Dependencies

- `pandas>=2.2.2`
- `numpy>=1.26.0`
- `scikit-learn>=1.5.0`
- `matplotlib>=3.9.0`

## Publishing to PyPI

```bash
uv pip install build twine
uv run python -m build
twine upload --skip-existing dist/*
```

Before publishing, bump the version in `pyproject.toml` (section `[project]`).
