Metadata-Version: 2.4
Name: mldock-io
Version: 0.1.0
Summary: Official CLI for MLDock — manage datasets, trainers, training jobs, and deployments from your terminal.
Author-email: Nexidra Technologies LLC <hello@mldock.io>
License: MIT
Project-URL: Homepage, https://www.mldock.io
Project-URL: Documentation, https://docs.mldock.io/cli
Project-URL: Repository, https://github.com/nexidra/mldock-io
Project-URL: Bug Tracker, https://github.com/nexidra/mldock-io/issues
Keywords: machine learning,mlops,ml platform,model deployment,dataset management,training,cli,mldock
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Intended Audience :: Developers
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Scientific/Engineering :: Artificial Intelligence
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: click>=8.1
Requires-Dist: requests>=2.31
Requires-Dist: rich>=13.0
Requires-Dist: tqdm>=4.66
Requires-Dist: python-dotenv>=1.0
Provides-Extra: docker
Requires-Dist: docker>=7.0; extra == "docker"
Provides-Extra: dev
Requires-Dist: pytest>=7; extra == "dev"
Requires-Dist: pytest-mock; extra == "dev"
Requires-Dist: responses; extra == "dev"
Requires-Dist: black; extra == "dev"
Requires-Dist: ruff; extra == "dev"
Requires-Dist: mypy; extra == "dev"

# mldock

Official CLI for [MLDock](https://www.mldock.io) — manage datasets, trainers, training jobs, and deployments from your terminal.

Works on **macOS, Linux, and Windows**.

---

## Installation

```bash
pip install mldock-io
```

With Docker-based local testing support:
```bash
pip install mldock-io[docker]
```

---

## Quick Start

```bash
# 1. Sign in via browser
mldock login

# 2. Upload a dataset
mldock dataset create "churn-data"
mldock dataset upload data.csv --dataset-id <id>

# 3. Publish a trainer
mldock trainer publish my_trainer.py

# 4. Start training
mldock train start my_trainer --dataset <id> --follow

# 5. View deployed model
mldock model list

# 6. Run inference
mldock model predict my_trainer --input '{"age": 35, "balance": 1200}'
```

---

## Authentication

### Browser-based login (recommended)

```bash
mldock login
```

Opens your browser to the MLDock sign-in page. After signing in and clicking **Authorize**, the CLI is authenticated automatically. Session is saved to `~/.mldock/session.json`.

```bash
mldock login --no-browser   # Print the URL instead of opening it
```

**Self-hosted instances:** the API and frontend run on separate ports (API: `8030`, frontend: `5200`). Use both flags:

```bash
mldock login \
  --base-url http://localhost:8030 \
  --frontend-url http://localhost:5200
```

`--base-url` sets the API endpoint for all CLI calls. `--frontend-url` overrides the host in the browser login URL (the server's configured `FRONTEND_BASE_URL` may be an internal address unreachable from your machine).

### Environment variables

```bash
export MLDOCK_BASE_URL=https://www.mldock.io       # API base URL
export MLDOCK_FRONTEND_URL=http://localhost:5200    # browser login URL (self-hosted only)
export MLDOCK_API_KEY=your-api-key                  # used by `mldock model predict` only
```

### Check current session

```bash
mldock whoami
```

### Sign out

```bash
mldock logout
```

---

## Datasets

```bash
# List all datasets
mldock dataset list

# Create an empty dataset
mldock dataset create "my-dataset" --description "Training data" --visibility private

# Upload a CSV or JSON file into a dataset
mldock dataset upload data.csv --dataset-id <id>
mldock dataset upload data.jsonl --dataset-id <id>

# Download a dataset to your machine
mldock dataset pull <id>                    # saves as <name>.csv in current dir
mldock dataset pull <id> --output ./data --format jsonl

# Show dataset details
mldock dataset info <id>

# Delete a dataset
mldock dataset delete <id>
```

**Supported upload formats:** `.csv`, `.json` (array), `.jsonl` (one object per line)

---

## Trainers

```bash
# List trainers in your workspace
mldock trainer list

# Publish a trainer file
mldock trainer publish my_trainer.py
mldock trainer publish my_trainer.py --name "churn-v2" --wait   # wait for scan approval

# Show trainer details
mldock trainer info <name>

# Delete (deactivate) a trainer
mldock trainer delete <name>
```

### Testing a trainer locally

```bash
# Test with sample input
mldock trainer test my_trainer.py --input '{"age": 35, "balance": 1200}'

# Test with a local dataset
mldock trainer test my_trainer.py --input '{}' --dataset ./data.csv

# Test with a remote MLDock dataset
mldock trainer test my_trainer.py --input '{}' --remote-dataset <dataset-id>
```

**How local testing works:**

| Environment | Behaviour |
|-------------|-----------|
| Docker available | Runs inside a container. Container is **kept alive** — re-runs skip cold start. |
| Docker not available | Creates a virtualenv at `~/.mldock/envs/<name>/`. Dependencies installed once; re-runs skip install if requirements unchanged. |

Works on macOS, Linux, and Windows. No Docker account required for the venv path.

Your trainer class must have `train()` and `predict()` methods. Dependencies are auto-detected from imports in your `.py` file.

---

## Training Jobs

```bash
# Start a training run
mldock train start my_trainer
mldock train start my_trainer --dataset <id>          # with dataset
mldock train start my_trainer --gpu                   # request GPU
mldock train start my_trainer --params '{"lr": 0.01}' # with hyperparams
mldock train start my_trainer --follow                 # stream until done

# Check job status
mldock train status <job-id>
mldock train status <job-id> --follow   # poll until complete

# List recent jobs
mldock train list
mldock train list --limit 50

# Cancel a job
mldock train cancel <job-id>
```

---

## Models (Deployments)

```bash
# List all deployed models
mldock model list

# Show deployment details
mldock model info <deployment-id>

# Run inference against a deployed model
mldock model predict my_trainer --input '{"text": "hello"}'
mldock model predict my_trainer --input '{"text": "hello"}' --api-key <key>

# View production metrics
mldock model metrics <deployment-id>

# Roll back to previous version
mldock model rollback <deployment-id>

# Delete a deployment
mldock model delete <deployment-id>
```

---

## Platform

```bash
# Check connectivity
mldock status

# Show CLI version
mldock --version
```

---

## Configuration

| Variable | Description | Default |
|----------|-------------|---------|
| `MLDOCK_BASE_URL` | MLDock API base URL | `https://www.mldock.io` |
| `MLDOCK_FRONTEND_URL` | Browser login URL (self-hosted only — overrides scheme+host+port in the login URL returned by the server) | — |
| `MLDOCK_API_KEY` | API key for inference calls | — |

Session file: `~/.mldock/session.json` (permissions: 600 — owner read/write only)

Local virtualenvs: `~/.mldock/envs/<trainer-name>/`

Docker workspaces: `~/.mldock/workspaces/<trainer-name>/`

---

## Trainer File Format

Your trainer must be a Python file with a class that has `train()` and `predict()`:

```python
# my_trainer.py
from sklearn.ensemble import GradientBoostingClassifier
import pandas as pd

class ChurnPredictor:
    def __init__(self):
        self.model = None

    def train(self, dataset_path: str, **kwargs):
        df = pd.read_csv(dataset_path)
        X = df.drop("churn", axis=1)
        y = df["churn"]
        self.model = GradientBoostingClassifier()
        self.model.fit(X, y)

    def predict(self, input_data: dict) -> dict:
        if self.model is None:
            return {"error": "not trained"}
        import numpy as np
        X = pd.DataFrame([input_data])
        prob = self.model.predict_proba(X)[0][1]
        return {"churn_probability": round(float(prob), 4)}
```

Test it locally before publishing:
```bash
mldock trainer test my_trainer.py --input '{"age": 35, "balance": 0}'
```

---

## Publishing to PyPI

```bash
pip install build twine
python -m build
twine upload dist/*
```

---

## Planned Features

- `mldock trainer logs <name>` — stream training logs in real time
- `mldock dataset annotate <id>` — open annotation task in browser
- `mldock model ab-test create` — set up A/B test between two versions
- `mldock model shadow <id>` — enable shadow mode for safe promotion
- `mldock drift status <trainer>` — view drift detection status
- `mldock team invite <email>` — invite a team member to your workspace
- `mldock api-key create` — create a scoped API key
- `mldock init` — scaffold a new trainer project from a template

---

## License

MIT — [Kreateyou Technologies Ltd](https://www.mldock.io)
