Metadata-Version: 2.4
Name: modelzone-sdk
Version: 1.0.0
Summary: Modelzone SDK – a slim model training and serving toolkit
License-Expression: Apache-2.0
Author: Team Enigma
Author-email: enigma@energinet.dk
Requires-Python: >=3.13
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: Apache Software License
Classifier: Programming Language :: Python :: 3.13
Requires-Dist: azure-ai-ml (>=1.32.0)
Requires-Dist: azure-identity (>=1.25.3)
Requires-Dist: deltalake (>=1)
Requires-Dist: mlflow (>=3)
Requires-Dist: pandas (<3)
Description-Content-Type: text/markdown

# ModelZone SDK

A Python package for managing, training, and installing machine learning models. The package consists of:

- A **Python SDK** used for writing the code for training and inference of a machine learning model.
- A **CLI** used for tracking and registration of models in Azure ML. 

## Installation

```bash
pip install modelzone-sdk
```

## Overview

Taking a look at the various Python machine learning frameworks out there, you will come across many different workflows used for training and prediction. ModelZone SDK uses the following definition:

> A **model project** is a Python package with a **training function**, which is responsible for (1) **tracking** metrics and graphs used for comparing different model experiments and (2) generating model **artifacts** (such as model parameters and hyper-parameters) that need to be shipped with the package.

ModelZone SDK assumes your machine learning workflow looks more or less like the following:

1. Experimentation by training different models (and hyper-parameters).
2. Release of the best suited model for production usage.
3. Using the released model for inference in operations.

![alt text](./docs/workflow.png)

---

## Quick guide

We have made an example of how a training and inference project should look like:
- Training project: [examples/training_example](./examples/training_example/)
- Inference project: [examples/predict_example](./examples/predict_example/)
- Test script running the full workflow: [examples/test.sh](./examples/test.sh)

---

## Detailed guide

### Model project structure

**Training function**

Your training function must be defined within the Python package:

```python
# my-model/model.py
def train():
    X, y = ...
    lin_reg = LinearRegression()
    lin_reg.fit(X, y)
```


**Tracking**

If you have experience with the built-in logging module of Python, tracking experiment data with ModelZone SDK will feel much the same. Tracking is done using a global `tracker`, which needs to be configured before running your code:

```python
# my-model/model.py
import modelzone as mz

tracker = mz.get_tracker()

def train():
    X, y = ...
    lin_reg = LinearRegression()
    tracker.log_tag("model_type", "LinearRegression")
    lin_reg.fit(X, y)
    y_hat = lin_reg.predict(X)
    mae = (y - y_hat).abs().mean()
    tracker.log_metric("MAE", mae)
```

When using the CLI tool, this takes care of configuring your tracker to log to the correct place (local or Azure AI Workspace).

**Artifacts**

You are free to store model artifacts as you would any data to be included in a Python package. However, you can also use the `Artifact` class from ModelZone SDK, which helps saving and loading artifacts by name (behind the scenes, it uses the pickle module):

```python
# my-model/model.py
import modelzone as mz

class Predictor(mz.Artifact):

    def __init__(self, model: Estimator):
        self.model = model

    def predict(self, ...):
        ...

def train():
    X, y = ...
    lin_reg = LinearRegression()
    lin_reg.fit(X, y)
    predictor = Predictor(lin_reg)
    predictor.save(name="my-artifact")
```

When installing this package elsewhere, you can now get back the artifact as follows:

```python
from mypackage import Predictor

predictor = Predictor.load(name="my-artifact")
```
