Metadata-Version: 2.4
Name: smartrain
Version: 0.0.1
Summary: YOLO datasets, training queue, runs analytics — workspace-first CLI
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: ultralytics>=8.0.0
Requires-Dist: pyyaml>=6.0
Requires-Dist: tqdm>=4.65.0
Requires-Dist: pandas>=2.0.0
Requires-Dist: matplotlib>=3.7.0
Requires-Dist: pillow>=10.0.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: albumentations>=1.4.0
Requires-Dist: typer>=0.9.0
Requires-Dist: rich>=13.0.0
Requires-Dist: prompt_toolkit>=3.0.0
Requires-Dist: fpdf2>=2.7.0
Requires-Dist: odfpy>=1.4.0
Requires-Dist: pypandoc-binary>=1.14.0
Requires-Dist: weasyprint>=60.0
Provides-Extra: dev
Requires-Dist: pytest>=7.4.0; extra == "dev"
Provides-Extra: clearml
Requires-Dist: clearml>=1.14.0; extra == "clearml"
Provides-Extra: sahi
Requires-Dist: sahi>=0.11.0; extra == "sahi"

> Russian version: [docs/ru/README.md](docs/ru/README.md)

# Smart Train (`smartrain`)

A CLI toolkit for preparing YOLO datasets, training models, running queues, and analyzing runs.

## Quick start

Requirements: Python `3.10+`.

```bash
git clone <repo-url>
cd smart-train
pip install -e .
```

Work from the project root (current directory):

```bash
smartrain deploy
smartrain scan
smartrain fusion --dataset ds_a --dataset ds_b --classes "class_a,class_b"
smartrain train --data 2026-01-01_12-00-00-merged -y
```

## What's included

- Single entry point: `smartrain` (module `smartrain.cli`).
- Single-workspace model: `raw_data/`, `datasets/`, `runs/`, `analytics/`, `models/`, `inference/`, `tmp/`.
- Pipeline support: `scan -> fusion -> train -> analyze`.
- Additional tools: `queue`, `registry`, `report`, `model`, `normalize-data-yaml`, `migrate-models`, `clearml-upload`, `plot`, `cvat`, `sahi`, `heatmap`, `orient`.

## How it works

`smartrain` uses a single workspace root and builds a process around file contracts:

- `scan` synchronizes sources and updates the dataset catalog;
- `fusion` generates the final dataset for training;
- `train` creates a run directory with metrics and metadata;
- `analyze` and `registry` work on artifacts in `runs/`.

## Key commands

| Command | Purpose |
|---|---|
| `smartrain deploy` | Initialize the workspace structure |
| `smartrain scan` | Synchronize sources and update the dataset catalog |
| `smartrain fusion` | Build the final training dataset |
| `smartrain train` | Train and validate YOLO models |
| `smartrain inference` | Run inference on folder or dataset split and save JSON report |
| `smartrain queue` / `smartrain queue-run` | Manage and run the command queue |
| `smartrain analyze` | Summaries, run comparison, PR curves, and inference benchmarks |
| `smartrain registry` | Catalog run artifacts and promoted models |

## Documentation

Current documentation is organized into sections in `docs/`:

- [Documentation navigation](docs/index.md)
- [Getting started and core workflows](docs/getting-started/quickstart.md)
- [CLI guide](docs/cli/overview.md)
- [API and format reference](docs/reference/api.md)
- [Architecture and diagrams](docs/development/architecture.md)

## Testing

```bash
pip install -e ".[dev]"
pytest
```

## Important details

- Interactive mode starts only when a command is launched with zero arguments (TTY required).
- Interactive dataset commands: `fusion`, `augment`, `balance`, `stats`, `roi`, `orient`, `inference`; plus `train`.
- Dataset cleanup command: `prune` (`prune empty` for empty pairs, `prune dedup` for duplicate images by content).
- If any arguments are provided but required ones are missing, commands return a clear "incomplete arguments" error instead of interactive prompts.
- Command help now includes practical `Examples` / `Quick examples` blocks for common workflows.
- `smartrain balance` presets:
  - `--preset weights-safe` for conservative balancing
  - `--preset rfs-aggressive` for stronger tail upsampling
  - `--preset hybrid-default` as a general default
- `smartrain balance` eval splits: `--eval-coverage` is on by default (keeps `val`/`test` non-empty when possible and improves class coverage there); use `--no-eval-coverage` to disable. The interactive wizard asks for this option.
- For `hash --validate`: `0` for a match, `1` for a mismatch, `2` for an error.
- By default, the workspace queue uses `queue.txt` and `tmp/status.txt`.
- Dependency extras:
  - `pip install -e ".[dev]"` for development and testing
  - `pip install -e ".[clearml]"` for ClearML
  - `pip install -e ".[sahi]"` for SAHI

## Common workflows

Scanning with an explicit source list:

```bash
smartrain scan --datasets-list /path/to/workspace/raw_data/datasets_list.txt
```

Check dataset hash:

```bash
smartrain hash --dataset my_dataset
smartrain hash /path/to/dataset --validate a1b2c3d4
```

Starting a queue without opening a GUI terminal:

```bash
smartrain queue run --no-gui
```

Quick run overview:

```bash
smartrain analyze scan
smartrain analyze export-table -o runs_summary.csv
```

## Developers

- [@palexab](https://github.com/palexab)
- [@greisersem](https://github.com/greisersem)
