Metadata-Version: 2.4
Name: mlflow2rdf
Version: 0.1.2
Summary: YAML-config-driven MLflow tracking data to RDF knowledge graphs with MLSO ontology alignment
Author-email: "Jason Jia (贾思捷)" <jason.jia87@gmail.com>
License: MIT
Project-URL: Homepage, https://github.com/dtai-kg/MLSO
Project-URL: Repository, https://github.com/jasonjia-ml/MLSO
Project-URL: Documentation, https://github.com/jasonjia-ml/MLSO#readme
Keywords: mlflow,rdf,knowledge-graph,mlso, ontology,semantic-web,machine-learning
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: mlflow>=2.0.0
Requires-Dist: rdflib>=6.0.0
Requires-Dist: pyshacl>=0.25.0
Requires-Dist: pyyaml>=6.0
Provides-Extra: dev
Requires-Dist: pytest>=7.0.0; extra == "dev"
Requires-Dist: build>=1.0.0; extra == "dev"
Requires-Dist: twine>=4.0.0; extra == "dev"
Provides-Extra: examples
Requires-Dist: pandas>=2.0.0; extra == "examples"

# MLflow to RDF Converter

**YAML-config-driven MLflow tracking data to RDF knowledge graphs — aligned with the MLSO ontology.**

---

## Overview

`mlflow2rdf` converts MLflow experiment/run/parameter/metric data into RDF triples using declarative YAML mappings. It supports two modes:

- **RML Standard** (recommended): W3C RML-compliant, aligned with MLSea
- **Custom YAML**: Simplified configuration for quick use

---

## Project Structure

```
mlflow-to-rdf/
├── config/
│   ├── rml_mappings.yaml   # RML standard mappings (recommended) ⭐
│   ├── mappings.yaml        # Custom declarative mappings
│   ├── sources.yaml         # MLflow data source config
│   └── validation.yaml      # SHACL validation rules
├── src/
│   ├── converter_rml.py     # RML standard converter ⭐
│   ├── rml_engine.py       # RML engine wrapper
│   ├── data_collector.py   # MLflow data collector
│   ├── converter.py        # Main converter (CLI entry point)
│   ├── engine.py           # Declarative conversion engine
│   ├── validators.py       # SHACL validator
│   └── utils.py            # Utilities
├── docs/
│   ├── PROJECT_DOCUMENTATION.md
│   ├── COMPARISON_ANALYSIS.md
│   └── RML_IMPLEMENTATION_SUMMARY.md
├── tests/
├── examples/
└── data/
```

---

## Quick Start

### Installation

```bash
pip install mlflow2rdf
```

### Basic Usage (CLI)

```bash
# Point to your MLflow tracking server
mlflow2rdf --mlflow-uri http://localhost:5000 --output output.ttl
```

### Programmatic Usage

```python
from mlflow2rdf import DeclarativeConverter, SHACLValidator

# Initialize converter with YAML configs
converter = DeclarativeConverter(
    sources_config_path='config/sources.yaml',
    mappings_config_path='config/mappings.yaml'
)

# Execute transformation
converter.convert()

# Validate against SHACL shapes
validator = SHACLValidator('config/validation.yaml')
result = validator.validate(converter.graph)

# Save RDF output
converter.save('output.ttl')
```

---

## Configuration

### `config/sources.yaml` — Data Source

```yaml
mlflow:
  type: mlflow
  uri: http://localhost:5000   # MLflow tracking server URI
  api_version: 2.0
  extraction:
    experiments: { enabled: true }
    runs: { enabled: true, experiment_ids: ["0"] }
    params: { enabled: true }
    metrics: { enabled: true }
    tags: { enabled: true }

output:
  format: turtle
  path: ./data/output.ttl
  namespaces:
    mlso:   http://example.org/mlso/
    prov:   http://www.w3.org/ns/prov#
    dcterms: http://purl.org/dc/terms/
    rdfs:   http://www.w3.org/2000/01/rdf-schema#
```

### `config/mappings.yaml` — Declarative Mapping Rules

Maps MLflow entities to MLSO ontology classes:

| MLflow Entity   | MLSO Class              |
|-----------------|-------------------------|
| Experiment      | `mlso:Experiment`        |
| Run             | `mlso:Run`               |
| Parameter       | `mlso:HyperParameterSetting` |
| Metric          | `mlso:Metric`            |
| Tag             | `dcterms:hasPart`        |

### `config/validation.yaml` — SHACL Shapes

Defines constraints (minCount, maxCount, datatype, allowed values) for each MLSO class. Run custom SPARQL rules for cross-entity validation.

---

## Features

- ✅ **Declarative mappings** — all transformation rules in YAML, no code changes needed
- ✅ **RML standard support** — W3C RML-compliant with YARRRML input
- ✅ **MLSO ontology alignment** — typed to MLSO vocabulary
- ✅ **SHACL validation** — ensure RDF output conforms to ontology constraints
- ✅ **Multiple RDF serializations** — Turtle, N3, JSON-LD, XML
- ✅ **CLI and library** — use as a tool or import as a package

---

## Documentation

- `README_RML.md` — RML standard usage guide
- `docs/PROJECT_DOCUMENTATION.md` — Full technical documentation
- `docs/COMPARISON_ANALYSIS.md` — Analysis vs. MLSea
- `docs/RML_IMPLEMENTATION_SUMMARY.md` — RML implementation summary

---

## Related Resources

- **MLSea Paper**: "MLSea: A Semantic Layer for Discoverable Machine Learning"
- **MLSea GitHub**: https://github.com/dtai-kg/MLSO
- **RML Spec**: https://rml.io/specs/rml/
- **YARRRML Spec**: https://rml.io/yarrrml/spec/

---

**Author**: Jason Jia  
**Version**: 0.1.1
