Metadata-Version: 2.4
Name: datadriftguard
Version: 0.1.0
Summary: Data drift detection toolkit for ML pipelines — PSI, KS, KL divergence, chi-square, and more.
Project-URL: Homepage, https://github.com/suryanandanbabbar/DriftGuardAI
Project-URL: Documentation, https://github.com/suryanandanbabbar/DriftGuardAI#readme
Project-URL: Repository, https://github.com/suryanandanbabbar/DriftGuardAI
Project-URL: Issues, https://github.com/suryanandanbabbar/DriftGuardAI/issues
Author: Surya Babbar
License-Expression: MIT
License-File: LICENSE
Keywords: data-quality,drift,machine-learning,mlops,model-monitoring,monitoring
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.10
Requires-Dist: numpy>=1.23
Requires-Dist: pandas>=1.5
Requires-Dist: pydantic>=2.0
Requires-Dist: python-dotenv>=1.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: scipy>=1.9
Provides-Extra: api
Requires-Dist: fastapi>=0.100; extra == 'api'
Requires-Dist: python-multipart>=0.0.6; extra == 'api'
Requires-Dist: uvicorn>=0.20; extra == 'api'
Provides-Extra: dashboard
Requires-Dist: streamlit>=1.25; extra == 'dashboard'
Provides-Extra: dev
Requires-Dist: build>=1.0; extra == 'dev'
Requires-Dist: mypy>=1.0; extra == 'dev'
Requires-Dist: pytest>=7.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Requires-Dist: twine>=5.0; extra == 'dev'
Description-Content-Type: text/markdown

# DriftGuardAI

[![PyPI version](https://img.shields.io/pypi/v/datadriftguard.svg)](https://pypi.org/project/datadriftguard/)
[![Python versions](https://img.shields.io/pypi/pyversions/datadriftguard.svg)](https://pypi.org/project/datadriftguard/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**Data drift detection toolkit for ML pipelines.** Monitor feature-level distribution shifts using PSI, KS test, KL divergence, chi-square test, and more.

DriftGuardAI helps ML engineers and data scientists detect when production data drifts away from training data — a leading indicator of model degradation.

## Installation

```bash
pip install datadriftguard
```

Optional extras:

```bash
# Include the FastAPI server
pip install datadriftguard[api]

# Include the Streamlit dashboard
pip install datadriftguard[dashboard]

# Install everything for development
pip install datadriftguard[dev,api,dashboard]
```

## Quick Start

```python
import pandas as pd
from driftguardai import DriftDetector, ThresholdSettings

# Load your baseline (training) and incoming (production) data
baseline = pd.read_csv("baseline.csv")
incoming = pd.read_csv("incoming.csv")

# Create a detector with default or custom thresholds
detector = DriftDetector(
    baseline_dataset=baseline,
    incoming_dataset=incoming,
    thresholds=ThresholdSettings(psi=0.20, ks_significance_level=0.05),
)

# Generate a drift report
report = detector.generate_report(dataset_name="production_model_v2")

# Inspect results
print(f"Total features: {report.total_features}")
print(f"Drifted features: {len(report.drifted_features)}")

for feature in report.drifted_features:
    print(f"  ⚠ {feature.feature_name} ({feature.feature_type}): drift detected")
    if feature.metrics.psi:
        print(f"    PSI: {feature.metrics.psi.value:.4f} (threshold: {feature.metrics.psi.threshold})")
    if feature.metrics.ks:
        print(f"    KS:  {feature.metrics.ks.value:.4f} (p={feature.metrics.ks.p_value:.4f})")
```

## Drift Metrics

| Metric | Type | Description |
|--------|------|-------------|
| **PSI** (Population Stability Index) | Numerical | Measures distribution shift between baseline and incoming data |
| **KS Test** (Kolmogorov-Smirnov) | Numerical | Non-parametric test for distribution equality |
| **KL Divergence** (Kullback-Leibler) | Numerical | Information-theoretic measure of distribution difference |
| **Chi-Square Test** | Categorical | Tests independence between categorical distributions |
| **Distribution Difference** | Categorical | Total variation distance between category frequencies |

## Alerting

DriftGuardAI includes an alert system that can dispatch drift notifications via logging, webhooks, or Slack:

```python
from driftguardai import AlertManager, DriftDetector
from driftguardai.core.config import AlertSettings

detector = DriftDetector(baseline, incoming)
report = detector.generate_report(dataset_name="production")

alert_manager = AlertManager(
    settings=AlertSettings(
        enabled=True,
        log_alerts=True,
        slack_webhook_url="https://hooks.slack.com/services/...",
    )
)
dispatch_report = alert_manager.dispatch(report)
print(f"Dispatched {dispatch_report.total_alerts} alerts")
```

## Retraining Triggers

Automatically evaluate whether model retraining should be triggered based on drift severity:

```python
from driftguardai import RetrainingManager
from driftguardai.core.config import RetrainingSettings

manager = RetrainingManager(
    settings=RetrainingSettings(
        enabled=True,
        trigger_severity="critical",
        min_alert_count=2,
    )
)
result = manager.evaluate(report)
if result.triggered:
    print(f"Retraining triggered: {result.reason}")
    print(f"Affected features: {result.affected_features}")
```

## Configuration

DriftGuardAI can be configured programmatically or via a `config.yaml` file:

```yaml
thresholds:
  psi: 0.20
  ks_significance_level: 0.05
  kl_divergence: 0.10
  categorical_distance: 0.10
  categorical_chi_square_significance_level: 0.05
  histogram_bins: 10
  histogram_strategy: quantile

alerts:
  enabled: true
  log_alerts: true
  minimum_severity: warning
  slack_webhook_url: https://hooks.slack.com/services/...

retraining:
  enabled: true
  trigger_severity: critical
  min_alert_count: 1
```

Place `config.yaml` in your working directory, or set the `DRIFT_GUARD_CONFIG_PATH` environment variable:

```bash
export DRIFT_GUARD_CONFIG_PATH=/path/to/your/config.yaml
```

## Optional: API Server

Run a FastAPI server for drift detection over HTTP:

```bash
pip install datadriftguard[api]
uvicorn driftguardai.api.app:app --reload
```

Endpoints:
- `GET /api/v1/health` — Health check
- `POST /api/v1/drift/analyze` — Analyze drift from file paths
- `POST /api/v1/drift/analyze/files` — Analyze drift from uploaded CSVs

## Optional: Streamlit Dashboard

Visualize drift metrics with an interactive dashboard:

```bash
pip install datadriftguard[dashboard]
streamlit run src/driftguardai/dashboard/app.py
```

## Architecture

```text
driftguardai/
├── core/          # Domain models, config, exceptions, interfaces, use cases
├── drift/         # Drift detection implementations and statistical metrics
├── data/          # Data ingestion and repository adapters
├── utils/         # Logging and dataset validation utilities
├── api/           # Optional FastAPI HTTP layer
└── dashboard/     # Optional Streamlit visualization
```

## Development

```bash
git clone https://github.com/suryanandanbabbar/DriftGuardAI.git
cd DriftGuardAI
python -m venv .venv
source .venv/bin/activate
pip install -e ".[dev,api,dashboard]"
pytest
```

## License

MIT — see [LICENSE](LICENSE) for details.
