Metadata-Version: 2.4
Name: slicemap-cli
Version: 0.2.0
Summary: Find the data slices where a new model regressed against an old one.
Project-URL: Homepage, https://github.com/jmweb-org/slicemap
Project-URL: Repository, https://github.com/jmweb-org/slicemap
Project-URL: Issues, https://github.com/jmweb-org/slicemap/issues
Author: José del Río
License: MIT License
        
        Copyright (c) 2026 José del Río
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: cli,error-analysis,mlops,model-evaluation,regression,slices
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Utilities
Requires-Python: >=3.10
Requires-Dist: numpy>=1.24
Requires-Dist: polars>=1.0
Requires-Dist: rich>=13.0
Requires-Dist: typer>=0.12
Description-Content-Type: text/markdown

# slicemap

[![CI](https://github.com/jmweb-org/slicemap/actions/workflows/ci.yml/badge.svg)](https://github.com/jmweb-org/slicemap/actions/workflows/ci.yml)
[![PyPI](https://img.shields.io/pypi/v/slicemap-cli.svg)](https://pypi.org/project/slicemap-cli/)
[![Python](https://img.shields.io/badge/python-3.10%2B-blue.svg)](https://www.python.org)
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](LICENSE)

Find the data slices where a new model regressed against an old one. Headless,
file in, table out.

A new model can lift the overall metric while quietly getting worse on a
segment that matters: one country, one age band, one product category. An
aggregate number hides it. `slicemap` takes a predictions file with both
models' outputs and the features, scores every slice, and lists the ones where
the new model lost ground, ranked by how many rows are affected.

```console
$ slicemap compare preds.parquet --true label --old pred_v1 --new pred_v2
slicemap: accuracy overall 0.910 -> 0.918
feature   slice        size   old     new    regression
country   BR            842   0.904   0.731   -0.173
device    tablet        311   0.880   0.795   -0.085
age       [55, 70)      540   0.901   0.860   -0.041
```

## Install

```console
$ pip install slicemap-cli                 # from PyPI, once released
$ pip install git+https://github.com/jmweb-org/slicemap   # latest, available now
```

Reads one CSV, Parquet or JSON Lines file containing the truth column, both
prediction columns, and the feature columns.

## Usage

```console
$ slicemap compare preds.parquet --true y --old pred_a --new pred_b
$ slicemap compare preds.csv --true y --old a --new b --features country,age
$ slicemap compare preds.csv --true y --old a --new b --metric error
$ slicemap compare preds.csv --true y --old a --new b --min-slice 50
$ slicemap compare preds.csv --true y --old a --new b --json
$ slicemap compare preds.csv --true y --old a --new b --check
```

If `--features` is omitted, every column except the truth and prediction columns
is treated as a feature.

### JSON output schema

`--json` writes a single object to stdout:

```json
{
  "metric": "accuracy",
  "old_overall": 0.910,
  "new_overall": 0.918,
  "regressions": [
    {
      "feature": "country",
      "slice": "BR",
      "size": 842,
      "old_score": 0.904,
      "new_score": 0.731,
      "regression": 0.173,
      "impact": 145.766
    }
  ]
}
```

| Field | Type | Description |
| --- | --- | --- |
| `metric` | string | Metric name used for scoring (e.g. `"accuracy"`) |
| `old_overall` | number | Overall metric score for the old model |
| `new_overall` | number | Overall metric score for the new model |
| `regressions` | array | Slices where the new model is worse, sorted by `impact` descending |
| `regressions[].feature` | string | Column name the slice is drawn from |
| `regressions[].slice` | string | Slice label (a category value or a quantile bin like `"[55, 70)"`) |
| `regressions[].size` | integer | Number of rows in the slice |
| `regressions[].old_score` | number | Old model's metric score on this slice |
| `regressions[].new_score` | number | New model's metric score on this slice |
| `regressions[].regression` | number | Absolute degradation (always positive) |
| `regressions[].impact` | number | `regression × size` — used for ranking |

All numeric values are rounded to six decimal places.

### In CI

Fail a model update when any slice regresses:

```yaml
- run: slicemap compare preds.parquet --true y --old champion --new challenger --check
```

## How slicing works

Categorical features slice by value; numeric features slice by quantile bins.
For each slice the metric is computed for both models, and the slice is flagged
when the new model is worse. Slices smaller than `--min-slice` are skipped to
avoid noise, and findings are ranked by **impact** (regression size times the
number of rows), so the segments worth fixing first come first.

## Metrics

| Metric | Direction |
| --- | --- |
| `accuracy` | higher is better |
| `error` | lower is better |
| `mae` | lower is better |

## Exit codes

| Code | Meaning |
| --- | --- |
| 0 | Compared; no slice regressed (or `--check` not set) |
| 1 | `--check` found at least one regressed slice |
| 2 | A column was missing, the metric is unknown, or the file is unsupported |

## License

MIT. See [LICENSE](LICENSE).
