Metadata-Version: 2.4
Name: mlflow-at1
Version: 0.1.0
Summary: MLflow artifact repository backed by verified, compressed, byte-exact AT-1 containers
Author: TinyFiles / AT-1
License: Apache-2.0
Project-URL: Homepage, https://tinyfiles.io
Project-URL: Documentation, https://tinyfiles.io/docs/mlflow
Keywords: mlflow,artifacts,compression,provenance,verified,lossless
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: mlflow-skinny>=2.0
Provides-Extra: full
Requires-Dist: mlflow>=2.0; extra == "full"

# mlflow-at1 — verified, compressed MLflow artifacts

Store MLflow models, checkpoints, and datasets as **verified, byte-exact AT-1 containers** instead of raw
files. Every artifact is compressed losslessly (never-worse vs xz), carries an embedded SHA-256, and is
**byte-exact-verified on download** — a tampered or corrupted artifact is *refused*, not silently returned.

That makes an artifact in your MLflow store **provably the exact bytes you logged** — the reproducibility /
provenance hook that regulated AI (EU AI Act, NIST AI RMF, FDA 21 CFR Part 11) needs: "this model is the
exact model that was trained, untampered."

It's a **pure-glue plugin** — no engine code ships here. It registers the `at1://` artifact-repository
scheme and shells out to the `at1` CLI.

## Install
```bash
npm i -g @tinyfiles/cli          # the at1 binary (the codec; no Python engine needed)
pip install mlflow-at1           # this plugin (registers the at1:// scheme)
#   pip install mlflow-at1[full] # if you want full MLflow rather than mlflow-skinny
```

## Use
Point an experiment's **artifact location** at an `at1://` path; everything else is normal MLflow:
```python
import mlflow
exp = mlflow.create_experiment("regulated-model", artifact_location="at1:///data/mlartifacts")
with mlflow.start_run(experiment_id=exp):
    mlflow.log_artifact("model.bin")      # -> stored as a verified, compressed AT-1 container
    # mlflow.sklearn.log_model(...) etc. all route through AT-1 too

# later — download is a byte-exact, SHA-256-verified reconstruction:
path = mlflow.artifacts.download_artifacts(run_id=run_id, artifact_path="model.bin")
```

`at1://<path>` is a local/mounted filesystem base directory for the containers (S3/GCS backing is a
follow-on). Set `AT1_BIN` to point at a specific `at1` binary if it isn't on `PATH`.

## What you get
- **Smaller** — lossless, never-worse than xz; typically a real reduction on weights/checkpoints/datasets.
- **Verified** — every container embeds a SHA-256; `download` refuses anything that doesn't reconstruct
  byte-for-byte. Explicit proof per artifact: `repo.integrity("model.bin")` → `integrity OK (sha256 …)`.
- **Addressable** — one `.at1` per artifact; tabular artifacts stay queryable in place via the `at1` CLI.
- **Zero lock-in** — the container is an ordinary file with a tiny open decoder; you can always get the
  exact original back, with or without this plugin.

## Test
```bash
pip install mlflow-skinny pytest && python -m pytest tests/ -q   # needs the `at1` CLI on PATH
```
5 tests: log→download byte-exact, logical listing, directory trees, integrity + tamper-rejection, delete.
