Metadata-Version: 2.4
Name: modelzone-sdk
Version: 3.0.2
Summary: Modelzone SDK – a slim model training and serving toolkit
License-Expression: Apache-2.0
Author: Team Enigma
Author-email: enigma@energinet.dk
Requires-Python: >=3.13
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: azure-ai-ml (>=1.32.0)
Requires-Dist: azure-identity (>=1.25.3)
Requires-Dist: deltalake (>=1)
Requires-Dist: mlflow (>=3)
Requires-Dist: pandas (<3)
Requires-Dist: plotly (>=6.0.0)
Requires-Dist: reporterly (>=1.0.1)
Description-Content-Type: text/markdown

# ModelZone SDK

A Python package for managing, training, and installing machine learning models. The package consists of:

- A **Python SDK** used for writing the code for training and inference of a machine learning model.
- A **CLI** used for tracking and registration of models in Azure ML. 

## Installation

```bash
pip install modelzone-sdk
```

## Overview

Taking a look at the various Python machine learning frameworks out there, you will come across many different workflows used for training and prediction. ModelZone SDK uses the following definition:

> A **model project** is a Python package with a **training function**, which is responsible for (1) **tracking** metrics and graphs used for comparing different model experiments and (2) generating model **artifacts** (such as model parameters and hyper-parameters) that need to be shipped with the package.

ModelZone SDK assumes your machine learning workflow looks more or less like the following:

1. Experimentation by training different models (and hyper-parameters).
2. Release of the best suited model for production usage.
3. Using the released model for inference in operations.

![alt text](./docs/workflow.png)

---

## Quick guide

We have made an example of how a training and inference project should look like:
- Training project: [examples/training_example](./examples/training_example/)
- Inference project: [examples/predict_example](./examples/predict_example/)
- Test script running the full workflow: [examples/test.sh](./examples/test.sh)

---

## Detailed guide

### Model project structure

**Training function**

Your training function must be defined within the Python package:

```python
# my-model/model.py
def train():
    X, y = ...
    lin_reg = LinearRegression()
    lin_reg.fit(X, y)
```


**Tracking**

If you have experience with the built-in logging module of Python, tracking experiment data with ModelZone SDK will feel much the same. Tracking is done using a global `tracker`, which needs to be configured before running your code (when using the CLI tool, this takes care of configuring your tracker to log to the correct place; local or Azure AI Workspace).

*Metrics and visualziation*

The tracker can be used for logging metrics and visualizations for your experiment run:

```python
# my-model/model.py
import modelzone as mz

tracker = mz.get_tracker()

def train():
    X, y = ...
    lin_reg = LinearRegression()
    tracker.log_tag("model_type", "LinearRegression")
    lin_reg.fit(X, y)
    y_hat = lin_reg.predict(X)
    mae = (y - y_hat).abs().mean()
    tracker.log_metric("MAE", mae)
```

The tracker can also be used for logging artifacts for your experiment run (e.g. model parameters). You define an artifact by inheriting from the `Artifact` class available in ModelZone SDK (all this does is to provide methods for saving and loading the class using the pickle module):

```python
# my-model/model.py
import modelzone as mz

tracker = mz.get_tracker()

class Predictor(mz.Artifact):

    def __init__(self, model: Estimator):
        self.model = model

    def predict(self, ...):
        ...

def train():
    X, y = ...
    lin_reg = LinearRegression()
    lin_reg.fit(X, y)
    predictor = Predictor(lin_reg)
    tracker.log_artifact("my-artifact", predictor)
```

When installing this package elsewhere, you can now get back the artifact as follows:

```python
from mypackage import Predictor

predictor = Predictor.load(name="my-artifact")
```

### Logging

ModelZone SDK allows for logging various items to your experiment run.

**Metrics**

You can log a single metric as follows

```py
tracker.log_metric("MAE", 1.23)
```

You can also log multiple related metrics (e.g. the same metric but with varying context):

```py
tracker.log_metric("MAE", 1.23, dimensions={"country": "DK"})
tracker.log_metric("MAE", 4.56, dimensions={"country": "SE"})
```

These will be visualized a Plotly table when tracking to Azure ML:

![alt text](./docs/multi-metrics.png)

**Figures**

You can log a single figure as follows:

```py
tracker.log_figure(
    name="Errors",
    figure=px.histogram(x=[1, 2, 2, 3, 3, 3]),
)
```

You can also log multiple related figures (e.g. the same figure but with varying context):

```py
tracker.log_figure(
    name="Errors",
    figure=px.histogram(x=[1, 2, 2, 3, 3, 3]),
    dimensions={"country": "DK"}
)
tracker.log_figure(
    name="Errors",
    figure=px.histogram(x=[4, 4, 5, 5, 5, 6]),
    dimensions={"country": "SE"}
)
```

These will be visualized a Plotly figure with dropdown selection when tracking to Azure ML:

![alt text](./docs/multi-figures.png)


