Metadata-Version: 2.4
Name: driftpipe
Version: 0.1.0
Summary: ML pipeline framework with data and model drift monitoring.
License: MIT
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: numpy>=1.24
Requires-Dist: matplotlib>=3.8

# DriftPipe

An ML pipeline framework with data and model drift monitoring.

## Installation

```bash
pip install driftpipe
# or, from a local source checkout:
pip install .
```

## Quick Start

```python
import numpy as np
from driftpipe import (
    IngestStage,
    EvaluateStage,
    PreprocessStage,
    TrainStage,
    BaselineStorage,
    Pipeline,
    Monitor,
    MonitorReport
)


class Ingest(IngestStage):
    def run(self):
        raw_data = np.random.randn(500, 4)
        labels = (raw_data[:, 0] + raw_data[:, 1] > 0).astype(int)
        return {
            "raw_data": raw_data,
            "labels": labels,
            "feature_names": ["f0", "f1", "f2", "f3"],
        }


class Preprocess(PreprocessStage):
    def run(self, raw_data):
        return {"processed_data": raw_data}


class Train(TrainStage):
    def run(self, processed_data, labels):
        threshold = float(np.mean(processed_data[:, 0]))
        return {
            "model": {"threshold": threshold},
            "labels": labels,
        }


class Evaluate(EvaluateStage):
    def run(self, model, raw_data, labels, feature_names, baseline_storage):
        predictions = (raw_data[:, 0] > model["threshold"]).astype(int)
        accuracy = float(np.mean(predictions == labels))
        metrics = {"accuracy": accuracy}

        baseline_storage.compute_and_save_features(raw_data, feature_names)
        baseline_storage.save_metrics(
            baseline_storage.metrics_baseline(metrics, n_samples=len(labels))
        )

        return {"metrics": metrics}


pipeline = Pipeline("weather_demo")

pipeline.baseline_storage = BaselineStorage("weather_demo")

pipeline.add_stage(Ingest)
pipeline.add_stage(Preprocess)
pipeline.add_stage(Train)
pipeline.add_stage(Evaluate)

# Run once to establish the baseline
result = pipeline.run()
assert result.success
pipeline.baseline_metrics = pipeline.baseline_storage.load_metrics()

# Push a new batch through the monitor
monitor = Monitor(pipeline)
new_raw_data = np.random.randn(500, 4) + 0.75
new_labels = (new_raw_data[:, 0] + new_raw_data[:, 1] > 0).astype(int)
monitor_context = {
    "raw_data": new_raw_data,
    "labels": new_labels,
    "feature_names": ["f0", "f1", "f2", "f3"],
}

monitor_result = monitor.run(monitor_context)
assert monitor_result.success

# Generate a drift report from the same monitoring batch
MonitorReport(monitor).generate(
    output_path="weather_drift_report.html",
    monitor_result=monitor_result,
)

# Save pipeline config
pipeline.to_config(
    path="pipeline_demo.json",
    metadata={"dataset": "demo"},
)
```

## Monitoring And Reports

`Monitor` compares current metrics against `pipeline.baseline_metrics` and, when `raw_data`, `feature_names`, and baseline feature data are available, automatically:

- stores `distributional_data` in the pipeline run context
- stores `distributional_metrics` in the pipeline run context
- runs KS and PSI checks for each feature
- Generates histograms comparing baseline and current distributions for each feature

`MonitorReport` is the high-level reporting utility. It accepts a `Monitor` and writes an HTML report.

See `examples/` for a full walkthrough.
