Metadata-Version: 2.4
Name: kaizenstat
Version: 0.1.0
Summary: Zero-friction AutoML + Data Cleaning Toolkit
Author: Masuddar Rahman
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: scikit-learn
Requires-Dist: rich
Dynamic: author
Dynamic: description
Dynamic: description-content-type
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# 🚀 KaizenStat

[![PyPI Version](https://img.shields.io/pypi/v/kaizenstat.svg)](https://pypi.org/project/kaizenstat/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg)](https://www.python.org/downloads/)

**KaizenStat** is a zero-friction data validation, automatic cleaning, and AutoML benchmarking toolkit designed to fit right into your daily data science workflow. It helps you diagnose and repair dataset issues instantly and trains baseline models to give you immediate insights.

---

## ✨ Features

- 🔍 **`kz.audit()`**: Instantly sweep datasets for duplicates, NaNs, infs, constant columns, and target label integrity.
- 🩹 **`kz.heal()`**: Automatically clean datasets by repairing missing targets, removing duplicates, dropping dead/constant columns, and imputing missing data using mean, median, or mode.
- 🚀 **`kz.benchmark()`**: Auto-detects objectives (classification/regression), builds pre-processing pipelines, trains elite models (Linear/Ridge, RandomForest, Neural Networks), and ranks them on a beautiful leaderboard.
- 💻 **CLI Interface**: Command line utility (`kz`) to audit, heal, or benchmark CSV datasets directly from the terminal.

---

## 📦 Installation

Install KaizenStat from PyPI:

```bash
pip install kaizenstat
```

Or install it locally in editable mode for development:

```bash
pip install -e .
```

---

## 🚀 Quickstart Usage

### Python API

```python
import pandas as pd
from kaizenstat import KaizenStat

# Load dataset
df = pd.read_csv("data.csv")

# 1. Audit dataset
KaizenStat.audit(df, target_column="target")

# 2. Automatically repair dataset issues
clean_df = KaizenStat.heal(df, target_column="target", method="fill_median")

# 3. Benchmark ML models
leaderboard = KaizenStat.benchmark(clean_df, target_column="target")
```

### 💻 Command Line Interface (CLI)

KaizenStat provides a powerful CLI tool named `kz` right out of the box:

#### Audit a dataset:
```bash
kz audit data.csv --target price
```

#### Heal a dataset:
```bash
kz heal data.csv --target price --method fill_median -o clean_data.csv
```

#### Benchmark a dataset:
```bash
kz benchmark clean_data.csv --target price
```

---

## 🛠 Development and Packaging

Build the package using `build`:

```bash
pip install build twine
python -m build
```

Upload to PyPI:

```bash
twine upload dist/*
```

---

## 📄 License

Distributed under the MIT License. See `LICENSE` for more information.
