Metadata-Version: 2.4
Name: kaizenstat
Version: 0.2.2
Summary: Zero-friction AutoML + Data Cleaning Toolkit
Author: Masuddar Rahman
Requires-Python: >=3.8
Description-Content-Type: text/markdown
Requires-Dist: pandas
Requires-Dist: numpy
Requires-Dist: scikit-learn
Requires-Dist: rich
Requires-Dist: joblib
Provides-Extra: ui
Requires-Dist: streamlit; extra == "ui"
Provides-Extra: gpu
Requires-Dist: xgboost; extra == "gpu"
Provides-Extra: fast
Requires-Dist: polars; extra == "fast"
Provides-Extra: all
Requires-Dist: streamlit; extra == "all"
Requires-Dist: xgboost; extra == "all"
Requires-Dist: polars; extra == "all"
Dynamic: author
Dynamic: description
Dynamic: description-content-type
Dynamic: provides-extra
Dynamic: requires-dist
Dynamic: requires-python
Dynamic: summary

# 🚀 KaizenStat

[![PyPI Version](https://img.shields.io/pypi/v/kaizenstat.svg?style=flat-square&color=blue)](https://pypi.org/project/kaizenstat/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg?style=flat-square)](https://opensource.org/licenses/MIT)
[![Python Version](https://img.shields.io/badge/python-3.8%2B-blue.svg?style=flat-square)](https://www.python.org/downloads/)
[![Code Style: Black](https://img.shields.io/badge/code%20style-black-000000.svg?style=flat-square)](https://github.com/psf/black)

**KaizenStat** is a zero-friction, production-grade AutoML, automated data cleaning, and model explanation engine. It allows you to audit datasets, repair data issues, benchmark models with hardware-aware optimization, export standalone pipeline code, and host web-based dashboards—all with a single command or Python import.

---

## 🎯 Core Philosophy

* **Zero-Friction AutoML:** No complex configuration files. Pass your dataset, name your target, and KaizenStat does the rest.
* **Production Crash-Proofing:** Automatically handles messy real-world data issues: high-cardinality ID columns, datetime parsing, missing inputs, class imbalance, and label encoding.
* **Explainable AI:** Breaks open the "black box" by generating standalone, human-readable Python training code reproducing the best-found pipeline.
* **Hybrid Interface:** 100% parity between CLI and Python API.

---

## 📦 Installation

Install the core package with zero heavy external dependencies:

```bash
pip install kaizenstat
```

### Optional Drivers & Accelerators

Tailor KaizenStat to your specific workload:

```bash
pip install kaizenstat[ui]     # Install Streamlit for web dashboards
pip install kaizenstat[gpu]    # Install XGBoost with GPU/MPS support
pip install kaizenstat[fast]   # Install Polars for ultra-fast CSV parsing
pip install kaizenstat[all]    # Install all optional components
```

---

## ⚔️ CLI & Python API Feature Matrix

KaizenStat is designed around a single unified vocabulary. Every CLI command has a direct, equivalent function in the Python SDK.

| Command | Python API | Purpose |
| :--- | :--- | :--- |
| `kz audit` | `KaizenStat.audit()` | 🔍 Runs a diagnostic sweep (missing values, duplicates, imbalance, dead features). |
| `kz heal` | `KaizenStat.heal()` | 🩹 Clean, impute, parse datetimes, drop IDs, and encode string labels. |
| `kz benchmark` | `KaizenStat.benchmark()` | 🚀 Automatically trains, optimizes, and ranks model pipelines. |
| `kz auto` | `KaizenStat.auto()` | ⚡ Orchestrates the entire pipeline in sequence (Audit ➔ Heal ➔ Benchmark). |
| `kz explain` | `KaizenStat.explain()` | 💬 Generates plain-English diagnostic summaries and model recommendations. |
| `kz codegen` | `KaizenStat.codegen()` | 📝 Generates standalone, dependency-free Python code for the best model. |
| `kz export-model` | `KaizenStat.save_model()` | 💾 Trains the top pipeline and saves it directly to a `.joblib` binary. |
| `kz report` | `KaizenStat.report()` | 📊 Generates a beautiful, interactive HTML profiling report with Chart.js. |
| `kz serve` | `KaizenStat.serve()` | 🌐 Launches a local web dashboard to explore the data and run predictions. |

---

## 💡 Quick Start Guide

### 1. Python SDK Usage

```python
from kaizenstat import KaizenStat
import pandas as pd

# Load your dataset
df = pd.read_csv("dataset.csv")

# 1. Diagnose issues
findings = KaizenStat.audit(df, target="target_column")

# 2. Automatically heal dirty data
clean_df = KaizenStat.heal(df, target="target_column")

# 3. Benchmark models with cross-validation
leaderboard = KaizenStat.benchmark(clean_df, target="target_column")

# 4. Generate standalone code for reproduction
KaizenStat.codegen("dataset.csv", target="target_column", output_path="reproduce.py")
```

### 2. Command Line Interface (CLI)

```bash
# Get quick help and list commands
kz --help

# Run the full pipeline in one command
kz auto dataset.csv --target target_column

# Repair a dataset and save the clean file
kz heal dataset.csv --target target_column -o cleaned_dataset.csv

# Launch a local Streamlit app to preview and test model performance
kz serve dataset.csv --target target_column --port 8501
```

---

## 🧠 Behind the Scenes: Core Engines

### 1. Hardware-Aware Execution
KaizenStat automatically checks your environment using `detect_device()`. It leverages CUDA on Nvidia GPUs and MPS on Apple Silicon (M1/M2/M3 Mac) to accelerate training when optional dependencies (like `xgboost`) are installed.

### 2. Smart Model Selection
The benchmarking engine adjusts its logic dynamically based on the dataset properties:
* **Large Datasets (>100k rows):** Excludes slow estimators (like Gradient Boosting) on standard CPU hosts to prevent compute lockups.
* **High-Cardinality Categoricals:** Optimizes feature preprocessors and prioritizes tree-based models (Random Forests, Gradient Boosting, XGBoost).
* **Float Targets:** Detects values with a continuous numeric profile and switches the entire pipeline to regression mode automatically.

### 3. Automatic Imbalance Correction
During data healing, KaizenStat computes target ratios. If target class distribution has a skew larger than `65% / 35%`, it adjusts model parameters dynamically (e.g. setting `class_weight="balanced"` in scikit-learn estimators).

---

## 🛠 Developer Guide

### Setting up a local workspace

To contribute or run local enhancements:

1. Clone the repository:
   ```bash
   git clone https://github.com/masuddarrahaman/KaizenStat-Library.git
   cd KaizenStat-Library
   ```
2. Install the package in editable mode with all optional drivers:
   ```bash
   pip install -e ".[all]"
   ```
3. Run tests or validation:
   ```bash
   python3 -m unittest discover -s tests
   ```

---

## 📄 License

Distributed under the MIT License. See `LICENSE` for details.
