Metadata-Version: 2.4
Name: incre-ml
Version: 0.1.0
Summary: Incremental machine learning in Python — learn one observation at a time
Author: Michael Wallner
License-Expression: MIT
Project-URL: Homepage, https://github.com/pespila/incre-ml
Project-URL: Repository, https://github.com/pespila/incre-ml
Project-URL: Issues, https://github.com/pespila/incre-ml/issues
Keywords: machine-learning,online-learning,incremental,streaming,anomaly-detection,forecasting,classification,drift-detection
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: numpy>=1.20.0
Requires-Dist: pandas>=1.3.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: pytest-cov>=4.0.0; extra == "dev"
Requires-Dist: ruff>=0.1.0; extra == "dev"
Requires-Dist: mypy>=1.0.0; extra == "dev"
Requires-Dist: pre-commit>=3.0.0; extra == "dev"
Requires-Dist: streamlit>=1.0.0; extra == "dev"
Requires-Dist: altair>=5.0.0; extra == "dev"
Requires-Dist: matplotlib>=3.4.0; extra == "dev"
Provides-Extra: dashboard
Requires-Dist: streamlit>=1.0.0; extra == "dashboard"
Requires-Dist: altair>=5.0.0; extra == "dashboard"
Requires-Dist: matplotlib>=3.4.0; extra == "dashboard"
Provides-Extra: kafka
Requires-Dist: confluent-kafka>=2.0; extra == "kafka"
Provides-Extra: mqtt
Requires-Dist: paho-mqtt>=1.6; extra == "mqtt"
Provides-Extra: connectors
Requires-Dist: confluent-kafka>=2.0; extra == "connectors"
Requires-Dist: paho-mqtt>=1.6; extra == "connectors"
Dynamic: license-file

# incre-ml

**Incremental machine learning in Python.** Every algorithm processes one observation at a time — no batches, no retraining, no accumulated history.

```python
from incre_ml.forecasting import HoltWinters

model = HoltWinters(season_length=24)

for x, y in stream:
    prediction = model.predict_one(x)
    model.learn_one(x, y)
```

## Why incre-ml?

Traditional ML libraries require batches. When data arrives continuously — sensor telemetry, financial ticks, patient vitals, API logs — you need models that update incrementally and predict instantly. incre-ml provides a complete ecosystem for this: forecasting, anomaly detection, classification, clustering, drift detection, uncertainty quantification, federated learning, and physics-informed constraints.

**Core API contract — every model implements:**

| Method | Purpose |
|---|---|
| `model.learn_one(x, y)` | Update on one observation |
| `model.predict_one(x)` | Predict from one observation |
| `model.explain_one(x)` | Feature contributions for one observation |
| `model.clear()` | Reset internal state |

All models use `x: dict[str, Any]` for features — sparse, heterogeneous, schema-agnostic by design.

## Installation

```bash
pip install incre-ml
```

With optional connectors:

```bash
pip install "incre-ml[kafka]"       # Confluent Kafka
pip install "incre-ml[mqtt]"        # MQTT / IoT
pip install "incre-ml[connectors]"  # All connectors
pip install "incre-ml[dashboard]"   # Streamlit demo app
```

## Capabilities

### Forecasting

8 forecasters: Naive, Holt-Winters, AR/SNARIMAX, Kalman Filter, RLS, Croston/TSB, Bootstrapped ensembles, and model selection.

```python
from incre_ml.forecasting import BootstrappedRegressor, HoltWinters

model = BootstrappedRegressor(HoltWinters(season_length=24), n_models=5)

pred, uncertainty = model.predict_with_uncertainty(x)
model.learn_one(x, y)
```

### Anomaly Detection

Statistical (Z-score), geometric (Half-Space Trees), and predictive detectors — composable via weighted ensemble voting. Includes CUSUM and EWMA industrial detectors.

```python
from incre_ml.anomaly import AnomalyEnsemble, ZScoreDetector, PredictiveAnomalyDetector
from incre_ml.forecasting import HoltWinters

ensemble = AnomalyEnsemble({
    "stat": ZScoreDetector(feature_name="temperature"),
    "pred": PredictiveAnomalyDetector(
        model=HoltWinters(season_length=96),
        feature_name="temperature",
    ),
})

score = ensemble.score_one({"temperature": 95.2})  # 0.0 (normal) to 1.0 (anomalous)
ensemble.learn_one({"temperature": 95.2})
```

### Streaming Classification

Hoeffding Tree, Logistic Regression, Naive Bayes, SGD, Adaptive Random Forest, and Windowed KNN — all incremental.

```python
from incre_ml.classification import HoeffdingTreeClassifier

clf = HoeffdingTreeClassifier(grace_period=10)

proba = clf.predict_proba_one(x)    # class probabilities
explanation = clf.explain_one(x)     # feature contributions
clf.learn_one(x, y)
```

### Pipelines

Chain transformers and predictors into unified streaming workflows.

```python
from incre_ml.compose import Pipeline
from incre_ml.preprocessing import StandardScaler, SelectKBest

pipe = Pipeline([StandardScaler(), SelectKBest(k=5), model])
pipe.learn_one(x, y)
```

### Drift Detection

ADWIN (exponential histogram, O(log n) memory), statistical detectors, and `DriftAdaptiveWrapper` for automatic model adaptation via reset, decay, or replacement strategies.

### Federated Learning

`FederatedEnsemble` trains local models per site/region and aggregates via averaging or median — without centralizing raw data.

```python
from incre_ml.federated import FederatedEnsemble
from incre_ml.linear import LinearRegression

fed = FederatedEnsemble(LinearRegression(), ["site_a", "site_b", "site_c"])

fed.learn_one("site_a", x, y)
global_pred = fed.predict_global(x)
fed.sync()  # aggregate local models
```

### Physics-Informed Constraints

Wrap any regressor with domain constraints to prevent physically implausible predictions.

```python
from incre_ml.physics.thermal import NewtonCoolingConstraint
from incre_ml.base.physics import PhysicsInformedWrapper

guard = NewtonCoolingConstraint(k=0.05, ambient_temp=15.0, max_deviation=3.0)
safe_model = PhysicsInformedWrapper(model, guard)
```

### Also Included

- **Clustering** — OnlineKMeans, DBSTREAM (density-based stream clustering)
- **Uncertainty** — Conformal prediction, adaptive conformal intervals (ACI), bootstrapped wrappers
- **Preprocessing** — Welford's scalers, online feature selection, temporal features, encoders
- **Evaluation** — Prequential (test-then-train) scoring protocol
- **Active Learning** — Uncertainty sampling
- **Explainability** — Per-prediction feature contributions
- **Metrics** — Online regression and classification metrics
- **Simulation** — Synthetic data generators for manufacturing, clinical, demand, finance, traffic, and building scenarios
- **Serving** — Production serving utilities
- **I/O** — CSV, Kafka, and MQTT connectors
- **Model Selection** — Bandit-based AutoML for streaming

## Interactive Dashboard

Explore all capabilities through 7 real-world scenarios with live streaming data:

```bash
pip install "incre-ml[dashboard]"
streamlit run app.py
```

**Scenarios:** Manufacturing quality, supply chain demand, sales anomaly monitoring, clinical triage, connected vehicle safety, API security monitoring, smart building energy.

## Design Principles

- **Welford's algorithm everywhere** — all statistics use O(1) memory incremental computation
- **Lazy state initialization** — internal state created on first `learn_one()`, not `__init__`
- **Composition over inheritance** — shallow hierarchies (1-2 levels), compose via Pipeline and ensembles
- **Strict typing** — `mypy --strict` on all library code
- **Return `self` from `learn_one()`** — enables method chaining

## Development

```bash
git clone https://github.com/pespila/incre-ml.git
cd incre-ml
python -m venv venv && source venv/bin/activate
pip install -e ".[dev]"
pre-commit install
```

```bash
ruff check . && ruff format .   # lint + format
mypy src                        # strict type checking
pytest                          # tests with coverage
```

## License

MIT — see [LICENSE](LICENSE) for details.
