Metadata-Version: 2.4
Name: michelangelo
Version: 0.1.1
Summary: Michelangelo is an end-to-end model lifecycle management system at large scale
Author: Michelangelo Team
Author-email: michelangelo-oss-group@uber.com
Requires-Python: >=3.9.0,<4.0.0
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Provides-Extra: dev
Provides-Extra: example
Provides-Extra: mactl
Provides-Extra: plugin
Provides-Extra: vllm
Requires-Dist: GitPython (>=3.1.44)
Requires-Dist: accelerate (==0.34.2) ; extra == "example" or extra == "vllm"
Requires-Dist: boto3 (>=1.40.0,<2.0.0) ; extra == "example"
Requires-Dist: chronon-ai (==0.0.109) ; extra == "example"
Requires-Dist: coverage[toml] (>=7.6.0,<8.0.0) ; extra == "dev"
Requires-Dist: datasets (==3.2.0) ; extra == "example" or extra == "vllm"
Requires-Dist: docker (>=7.1.0,<8.0.0) ; extra == "dev"
Requires-Dist: einops (==0.8.0) ; extra == "example" or extra == "vllm"
Requires-Dist: fsspec (>=2023.10.0)
Requires-Dist: grpcio (>=1.66.1)
Requires-Dist: grpcio-reflection (>=1.66.1)
Requires-Dist: jinja2 (>=3.1.2,<4.0.0) ; extra == "dev"
Requires-Dist: kaggle (>=1.6.17,<2.0.0) ; extra == "example"
Requires-Dist: mlflow (>=2.0.0,<3.0.0) ; extra == "example"
Requires-Dist: numpy (>=1.26.0,<2.0.0) ; extra == "dev" or extra == "example" or extra == "vllm"
Requires-Dist: pandas (>=2.1.0,<3.0.0) ; extra == "dev" or extra == "example" or extra == "vllm"
Requires-Dist: peft (>=0.3.0,<0.4.0) ; extra == "example"
Requires-Dist: pre-commit (>=4.2.0,<5.0.0) ; extra == "dev"
Requires-Dist: protobuf (>=5.27.1)
Requires-Dist: pyarrow (>=19.0.0,<20.0.0) ; extra == "dev" or extra == "example" or extra == "vllm"
Requires-Dist: pydantic (>=2.10.6)
Requires-Dist: pyspark (==3.5.5) ; extra == "plugin" or extra == "example"
Requires-Dist: pytest (>=8.3.4,<9.0.0) ; extra == "dev"
Requires-Dist: pytest-cov (>=6.0.0,<7.0.0) ; extra == "dev"
Requires-Dist: pytorch_lightning (==2.2.0) ; extra == "dev" or extra == "example" or extra == "vllm"
Requires-Dist: pyyaml (>=6.0.1,<7.0.0)
Requires-Dist: ray[default] (>=2.41.0,<3.0.0) ; extra == "dev" or extra == "plugin" or extra == "example" or extra == "example" or extra == "vllm"
Requires-Dist: ruff (>=0.9.1,<0.10.0) ; extra == "dev"
Requires-Dist: s3fs (==2024.9.0) ; extra == "example" or extra == "vllm"
Requires-Dist: scikit-learn (>=1.5.2,<2.0.0) ; extra == "example"
Requires-Dist: setuptools (==74.1.1) ; extra == "vllm"
Requires-Dist: sqlglot (>=27.29.0,<28.0.0) ; extra == "example"
Requires-Dist: thrift (==0.13.0) ; extra == "example"
Requires-Dist: tomli (>=2.0.1,<3.0.0) ; python_version < "3.11"
Requires-Dist: torch (==2.6.0) ; extra == "dev" or extra == "example" or extra == "vllm"
Requires-Dist: transformers (==4.48.2) ; extra == "dev" or extra == "example" or extra == "vllm"
Requires-Dist: typing-extensions (>=4.12.2)
Requires-Dist: vllm (==0.8.1) ; extra == "vllm"
Requires-Dist: xgboost (==2.1.4) ; extra == "example"
Requires-Dist: xgboost_ray (==0.1.19) ; extra == "example"
Description-Content-Type: text/markdown

# Michelangelo

**An end-to-end ML platform for building, training, and registering machine learning models at scale.**

[![Documentation](https://img.shields.io/badge/docs-michelangelo--ai.github.io-blue)](https://michelangelo-ai.github.io/michelangelo/docs)
[![GitHub](https://img.shields.io/badge/github-michelangelo--ai%2Fmichelangelo-lightgrey)](https://github.com/michelangelo-ai/michelangelo)

Michelangelo gives ML engineers and data scientists a unified Python SDK for the entire model lifecycle — from data preparation and distributed training to model registration and production deployment. Define your ML workflows as Python functions using simple decorators, and Michelangelo handles orchestration, caching, and scaling across Ray and Spark clusters.

## Key Features

- **UniFlow Pipeline Framework** — Define ML workflows with `@task` and `@workflow` decorators. Write plain Python functions and Michelangelo handles distributed execution, data passing between tasks, and result caching.

- **Distributed Execution** — Scale tasks across Ray or Spark clusters with a single config change. Specify CPU, memory, GPU, and worker resources per task — no changes to your business logic required.

- **Built-in Caching and Resume** — Tasks cache results automatically based on inputs. If a pipeline fails partway through, resume from where it left off instead of rerunning everything.

- **Python API Client** — Programmatically manage projects, pipelines, model registry, and pipeline runs through a gRPC-based Python client.

- **CLI (`ma`)** — Register pipelines, manage triggers, run sandboxes, and interact with the Michelangelo platform from your terminal.

- **Flexible Storage** — Read and write data across S3, GCS, HDFS, and local filesystems using the fsspec-based storage layer.

## Installation

Install the core package:

```bash
pip install michelangelo
```

Install with distributed execution plugins (Ray and Spark):

```bash
pip install michelangelo[plugin]
```

### Install Extras

| Extra | What it includes | When to use it |
|-------|-----------------|----------------|
| `michelangelo[plugin]` | Ray, PySpark | You want to run tasks on distributed Ray or Spark clusters |
| `michelangelo[vllm]` | vLLM, Ray, PyTorch, Transformers | You're serving or fine-tuning large language models |
| `michelangelo[example]` | All ML libraries for examples | You want to run the included example projects |
| `michelangelo[dev]` | pytest, ruff, pre-commit, Ray | You're contributing to Michelangelo itself |

## Quickstart

Here's a minimal pipeline that loads data and trains a model using Ray for distributed execution:

```python
import michelangelo.uniflow.core as uniflow
from michelangelo.uniflow.plugins.ray import RayTask


@uniflow.task(config=RayTask(head_cpu=1, head_memory="2Gi"))
def load_data(path: str):
    """Load and preprocess data."""
    # Your data loading logic here
    print(f"Loading data from {path}")
    return {"train": [1, 2, 3], "test": [4, 5]}


@uniflow.task(config=RayTask(head_cpu=2, head_memory="4Gi"))
def train_model(data):
    """Train a model on the prepared data."""
    print(f"Training on {len(data['train'])} samples")
    return {"accuracy": 0.95}


@uniflow.workflow()
def training_pipeline(data_path: str):
    """A simple training pipeline."""
    data = load_data(data_path)
    result = train_model(data)
    return result


if __name__ == "__main__":
    ctx = uniflow.create_context()
    ctx.run(training_pipeline, data_path="s3://my-bucket/data")
```

Run locally:

```bash
python my_pipeline.py
```

Want to use Spark instead of Ray? Just swap the task config:

```python
from michelangelo.uniflow.plugins.spark import SparkTask

@uniflow.task(config=SparkTask(driver_cpu=2, executor_cpu=4, executor_instances=3))
def process_data(df):
    # Your Spark processing logic
    return df
```

For complete working examples, see the [examples directory](https://github.com/michelangelo-ai/michelangelo/tree/main/python/examples), including:

- [BERT fine-tuning on CoLA](https://github.com/michelangelo-ai/michelangelo/tree/main/python/examples/bert_cola) — Text classification with distributed GPU training
- [XGBoost on Boston Housing](https://github.com/michelangelo-ai/michelangelo/tree/main/python/examples/boston_housing_xgb) — Tabular regression with distributed training
- [GPT fine-tuning with LoRA](https://github.com/michelangelo-ai/michelangelo/tree/main/python/examples/gpt_oss_20b_finetune) — Large language model fine-tuning

## Using the Python API Client

Manage platform resources programmatically:

```python
from michelangelo.api.v2.client import APIClient

APIClient.set_caller("my-client")

# List projects
projects = APIClient.ProjectService.list_project(namespace="default")

# Create a new project
from michelangelo.gen.api.v2.project_pb2 import Project

proj = Project()
proj.metadata.namespace = "default"
proj.metadata.name = "my-project"
proj.spec.description = "My ML project"
APIClient.ProjectService.create_project(proj)
```

Set the API server address via environment variable:

```bash
export MICHELANGELO_API_SERVER="localhost:12345"
```

## Documentation

Full documentation is available at **[michelangelo-ai.github.io/michelangelo/docs](https://michelangelo-ai.github.io/michelangelo/docs)**.

- [User Guides](https://michelangelo-ai.github.io/michelangelo/docs/user-guides) — Step-by-step guides for data preparation, training, and deployment
- [ML Pipelines](https://michelangelo-ai.github.io/michelangelo/docs/user-guides/ml-pipelines) — Deep dive into the UniFlow pipeline framework
- [Set Up Triggers](https://michelangelo-ai.github.io/michelangelo/docs/user-guides/set-up-triggers) — Automate pipeline execution with cron and backfill triggers
- [CLI Reference](https://michelangelo-ai.github.io/michelangelo/docs/user-guides/cli) — Full command-line interface documentation

## Contributing

We welcome contributions! To get started:

```bash
git clone https://github.com/michelangelo-ai/michelangelo.git
cd michelangelo/python
pip install -e ".[dev]"
```

Run the test suite:

```bash
pytest
```

Format your code:

```bash
ruff format .
ruff check .
```

## Requirements

- Python 3.9+

## License

See [LICENSE](https://github.com/michelangelo-ai/michelangelo/blob/main/LICENSE) for details.

