Metadata-Version: 2.4
Name: dataeval-flow
Version: 0.1.0
Summary: DataEval Workflows container for data evaluation
Project-URL: Repository, https://gitlab.jatic.net/jatic/aria/dataeval-flow
License-Expression: MIT
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Science/Research
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Scientific/Engineering
Classifier: Typing :: Typed
Requires-Python: <3.14,>=3.10
Requires-Dist: dataeval==1.0.4
Requires-Dist: datasets>=4.0.0
Requires-Dist: maite-datasets>=0.0.12
Requires-Dist: pydantic>=2.0
Requires-Dist: pyyaml>=6.0
Provides-Extra: all-cpu
Requires-Dist: onnx>=1.15; extra == 'all-cpu'
Requires-Dist: onnxruntime>=1.20; extra == 'all-cpu'
Requires-Dist: opencv-python-headless>=4.8.0; extra == 'all-cpu'
Requires-Dist: textual>=3.0; extra == 'all-cpu'
Requires-Dist: torch>=2.2.0; extra == 'all-cpu'
Requires-Dist: torchvision>=0.17.0; extra == 'all-cpu'
Provides-Extra: all-cu118
Requires-Dist: onnx>=1.15; extra == 'all-cu118'
Requires-Dist: onnxruntime-gpu>=1.20; extra == 'all-cu118'
Requires-Dist: opencv-python-headless>=4.8.0; extra == 'all-cu118'
Requires-Dist: textual>=3.0; extra == 'all-cu118'
Requires-Dist: torch>=2.2.0; extra == 'all-cu118'
Requires-Dist: torchvision>=0.17.0; extra == 'all-cu118'
Provides-Extra: all-cu124
Requires-Dist: onnx>=1.15; extra == 'all-cu124'
Requires-Dist: onnxruntime-gpu>=1.20; extra == 'all-cu124'
Requires-Dist: opencv-python-headless>=4.8.0; extra == 'all-cu124'
Requires-Dist: textual>=3.0; extra == 'all-cu124'
Requires-Dist: torch>=2.2.0; extra == 'all-cu124'
Requires-Dist: torchvision>=0.17.0; extra == 'all-cu124'
Provides-Extra: all-cu128
Requires-Dist: onnx>=1.15; extra == 'all-cu128'
Requires-Dist: onnxruntime-gpu>=1.23.2; extra == 'all-cu128'
Requires-Dist: opencv-python-headless>=4.8.0; extra == 'all-cu128'
Requires-Dist: textual>=3.0; extra == 'all-cu128'
Requires-Dist: torch>=2.2.0; extra == 'all-cu128'
Requires-Dist: torchvision>=0.17.0; extra == 'all-cu128'
Provides-Extra: app
Requires-Dist: textual>=3.0; extra == 'app'
Provides-Extra: cpu
Requires-Dist: torch>=2.2.0; extra == 'cpu'
Requires-Dist: torchvision>=0.17.0; extra == 'cpu'
Provides-Extra: cu118
Requires-Dist: torch>=2.2.0; extra == 'cu118'
Requires-Dist: torchvision>=0.17.0; extra == 'cu118'
Provides-Extra: cu124
Requires-Dist: torch>=2.2.0; extra == 'cu124'
Requires-Dist: torchvision>=0.17.0; extra == 'cu124'
Provides-Extra: cu128
Requires-Dist: torch>=2.2.0; extra == 'cu128'
Requires-Dist: torchvision>=0.17.0; extra == 'cu128'
Provides-Extra: onnx
Requires-Dist: onnx>=1.15; extra == 'onnx'
Requires-Dist: onnxruntime>=1.20; extra == 'onnx'
Provides-Extra: onnx-gpu
Requires-Dist: onnx>=1.15; extra == 'onnx-gpu'
Requires-Dist: onnxruntime-gpu>=1.20; extra == 'onnx-gpu'
Provides-Extra: opencv
Requires-Dist: opencv-python-headless>=4.8.0; extra == 'opencv'
Description-Content-Type: text/markdown

# DataEval Workflows

Workflow orchestration for DataEval with GPU support.

## Quick Start

```bash
# 1. Build CUDA 11.8 container
docker build -f docker/Dockerfile.cu118 -t dataeval:cu118 .

# 2. Show help
docker run dataeval:cu118

# 3. Run with data and output
docker run --gpus all \
  --mount type=bind,source=/path/to/data,target=/dataeval,readonly \
  --mount type=bind,source=/path/to/output,target=/output \
  dataeval:cu118
```

## Requirements

| Requirement | Version |
|-------------|---------|
| Docker | >= 20.10 |
| NVIDIA GPU | Any (for GPU mode) |
| NVIDIA Driver | >= 520 (for GPU mode) |
| CUDA | 11.8.0 (for GPU mode) |

### Verify GPU Access

```bash
docker run --rm --gpus all nvidia/cuda:11.8.0-base-ubuntu22.04 nvidia-smi
```

## Volume Mounts

| Path | Mode | Purpose |
|------|------|---------|
| `/dataeval` | ro | Data directory — datasets, models, configs (required) |
| `/output` | rw | Results (required) |
| `/cache` | rw | Computation cache (optional) |

### File Permissions

The container runs as a non-root user (`dataeval`, UID 1000). Mounted directories for `/output` and `/cache` must be writable by the container process. There are two approaches:

#### Option 1: Pass your host UID (recommended)

Use `--user` to run the container as your host user, so mounted directories are naturally writable:

```bash
docker run --gpus all \
  --user "$(id -u):$(id -g)" \
  --mount type=bind,source=/path/to/data,target=/dataeval,readonly \
  --mount type=bind,source=/path/to/output,target=/output \
  dataeval:cu118
```

#### Option 2: Open directory permissions

Make the output and cache directories world-writable on the host:

```bash
chmod 777 /path/to/output /path/to/cache
```

Then run without `--user`. This is simpler but less secure.

### Custom Data Root

The data root path can be overridden via the `DATAEVAL_DATA` environment variable:

```bash
docker run --gpus all \
  -e DATAEVAL_DATA=/data \
  --mount type=bind,source=/path/to/data,target=/data,readonly \
  --mount type=bind,source=/path/to/output,target=/output \
  dataeval:cu118
```

## Configuration

Config files (YAML or JSON) can be placed anywhere in your data directory. By default, all YAML/JSON files at the root of the data mount are auto-discovered and merged.

To specify a config path explicitly:

```bash
# Config folder within data directory
docker run --gpus all \
  --mount type=bind,source=/path/to/data,target=/dataeval,readonly \
  --mount type=bind,source=/path/to/output,target=/output \
  dataeval:cu118 --config config/

# Single config file
docker run --gpus all \
  --mount type=bind,source=/path/to/data,target=/dataeval,readonly \
  --mount type=bind,source=/path/to/output,target=/output \
  dataeval:cu118 --config params.yaml
```

Dataset and model paths in config files are resolved relative to the data root (`/dataeval` by default).

## Dataset Formats

Currently supported dataset structures:

| Format | Structure | Example |
|--------|-----------|---------|
| **Dataset** | Single split, used directly | `cifar10_test/` |
| **DatasetDict** | Multiple splits (dict), configured via config YAML | `cifar10_full/` |

## CPU Fallback

For machines without NVIDIA GPU:

```bash
docker build -f docker/Dockerfile.cpu -t dataeval:cpu .
docker run dataeval:cpu  # Shows help
docker run \
  --mount type=bind,source=/path/to/data,target=/dataeval,readonly \
  --mount type=bind,source=/path/to/output,target=/output \
  dataeval:cpu
```

## CLI Modes

DataEval Flow has three modes:

| Command                | Purpose                                                          |
| ---------------------- | ---------------------------------------------------------------- |
| `dataeval-flow [opts]` | Headless execution — for automation and CI/CD pipelines          |
| `dataeval-flow app`    | Interactive TUI dashboard — configure, execute, and view results |
| `dataeval-flow config` | Simple CLI config builder — create/edit configs without the TUI  |

### Interactive TUI (`app`)

**Installation:**

```bash
uv sync --extra app          # or: pip install dataeval-flow[app]
```

**Usage:**

```bash
# Launch with a blank config
python -m dataeval_flow app

# Load an existing config for editing
python -m dataeval_flow app --config /path/to/params.yaml
```

The TUI provides a three-pane dashboard for config editing, task execution, and result viewing. It auto-discovers available torchvision transforms, dataeval selection classes, and workflow types, generating dynamic parameter forms from their schemas.

### Simple CLI Config Builder (`config`)

For environments without the TUI dependency:

```bash
python -m dataeval_flow config
python -m dataeval_flow config --config /path/to/params.yaml
```

Configs can be saved as YAML or JSON.

## Dependencies

- `dataeval` - Core evaluation library
- `datasets` - Huggingface library
- `maite-datasets` - MAITE protocol adapter
- `maite` - MAITE protocol library
- `pydantic` - Structural typing and schema validation

## Troubleshooting

### Build appears stuck at `uv sync`

The Docker build may appear frozen during the `uv sync` step:

```
=> [builder 7/7] RUN uv sync --frozen --no-dev --no-install-project    1139.3s
```

**This is normal.** The step downloads ~2GB of dependencies (PyTorch, scipy, etc.) with no progress indicator.

| Network Speed | Expected Build Time |
|---------------|---------------------|
| 100 Mbps | ~10 minutes |
| 30 Mbps | ~20 minutes |
| 10 Mbps | ~45 minutes |

**Tip:** First build is slow; subsequent builds use Docker cache and complete in seconds.

## Running Without Container

The `dataeval_flow` package can be used standalone without Docker.

**Installation:**
```bash
git clone https://gitlab.jatic.net/jatic/aria/dataeval-flow.git
cd dataeval-flow
uv sync
```

**CLI Usage:**
```bash
python -m dataeval_flow --config /path/to/config --output /path/to/output
python -m dataeval_flow --data /path/to/data --output /path/to/output
```

**Python API Usage:**
```python
from pathlib import Path
from dataeval_flow import load_config, run_tasks

config = load_config(Path("/path/to/data/config.yaml"))
results = run_tasks(config, data_dir=Path("/path/to/data"))
print(results[0].report())
```

**Development:**
```bash
uv sync --group dev
nox
```

## License

MIT
