Metadata-Version: 2.4
Name: imbeval
Version: 0.1.0
Summary: Honest, production-readiness evaluation for imbalanced classification models.
Project-URL: Homepage, https://github.com/sricodings
Project-URL: Repository, https://github.com/sricodings/imbeval
Project-URL: Issues, https://github.com/sricodings/imbeval/issues
Project-URL: Documentation, https://github.com/sricodings/imbeval#readme
Author-email: Srikanth Sridhar <srisrikanthtvs@gmail.com>
License-Expression: MIT
License-File: LICENSE
Keywords: calibration,imbalanced-classification,machine-learning,model-evaluation,threshold-tuning
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Requires-Dist: numpy>=1.21
Requires-Dist: scikit-learn>=1.0
Provides-Extra: dev
Requires-Dist: build; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: twine; extra == 'dev'
Description-Content-Type: text/markdown

# imbeval

**Honest production-readiness evaluation for imbalanced classification models.**

Standard metric libraries hand you precision/recall/F1 and stop there. On imbalanced
data (fraud, churn, medical diagnosis, anomaly detection, rare-event prediction) that's
not enough to know if a model is actually safe to ship. `imbeval` answers the real
question: **is this model usable in production, and at what threshold?**

It combines three things most teams check manually and inconsistently:

1. **Minority-class performance** — not buried inside macro-averages.
2. **Calibration quality** — is the model's confidence trustworthy, or just confidently wrong?
3. **Threshold tuning** — the default 0.5 threshold is almost always wrong on imbalanced data; `imbeval` finds a better one, optionally weighted by real business cost (cost of a false positive vs a false negative).

## Install

```bash
pip install imbeval
```

(Once published — see the [publishing guide](docs/publishing.md) if you're building this from source.)

## Quickstart

```python
from imbeval import evaluation_report

# y_true: ground truth labels (0/1)
# y_pred_proba: predicted probability of the positive class, from model.predict_proba(X)[:, 1]
report = evaluation_report(
    y_true,
    y_pred_proba,
    cost_fp=1,     # cost of a false alarm
    cost_fn=25,    # cost of missing a true positive (e.g. missed fraud)
)

print(report["verdict"])
print(report["minority_class"])
print(report["optimal_f1_threshold"])
print(report["cost_sensitive_threshold"])
```

Example output:

```
Not yet production-ready: minority-class recall is below 50% at the default 0.5 threshold;
default 0.5 threshold is far from optimal; consider using optimal_f1_threshold.
```

## What's inside

| Function | What it does |
|---|---|
| `evaluation_report(y_true, y_pred_proba, ...)` | One combined report + plain-English verdict |
| `minority_class_report(y_true, y_pred)` | Precision/recall/F1 focused on the minority class |
| `per_class_confidence(y_true, y_pred_proba)` | Mean model confidence per true class |
| `calibration_score(y_true, y_pred_proba)` | Expected Calibration Error (ECE) |
| `reliability_curve(y_true, y_pred_proba)` | Data for plotting a reliability diagram |
| `optimal_threshold(y_true, y_pred_proba)` | Best decision threshold by F1 |
| `cost_sensitive_threshold(y_true, y_pred_proba, cost_fp, cost_fn)` | Best threshold by real business cost |

Full API reference: [docs/api.md](docs/api.md)
Usage guide and recipes: [docs/usage.md](docs/usage.md)
Publishing this package yourself: [docs/publishing.md](docs/publishing.md)

## Why this exists

Most "imbalanced learning" tools (e.g. `imbalanced-learn`) focus on *fixing* the data
(SMOTE and friends). `imbeval` focuses on the other end of the pipeline: telling you
honestly whether the *model* you already trained is good enough, and at what threshold,
once class imbalance is in play. It's meant to sit right before you ship.

## Status

Early (v0.1.0). The core API (`evaluation_report`, threshold tools, calibration tools)
is stable for binary classification. Multi-class support is on the roadmap — see
[CHANGELOG.md](CHANGELOG.md).

## Contributing

Issues and PRs welcome once the repo is public. See [docs/usage.md](docs/usage.md) for
how the modules fit together if you want to extend it.

## License

MIT — see [LICENSE](LICENSE).
