Metadata-Version: 2.4
Name: cv-datakit
Version: 0.1.0
Summary: Tools to merge and remap computer vision datasets
Author: Imtiaz Ahammed Anik
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: matplotlib>=3.9.4
Requires-Dist: numpy>=1.24
Requires-Dist: Pillow>=10.0
Requires-Dist: PyYAML>=6.0
Requires-Dist: rich>=13.7
Dynamic: license-file

# datakit

Python package for YOLO-format dataset operations:
- merge multiple datasets into one
- merge multiple class names into a target class
- remap class IDs
- visualize labeled samples


## Install 

```bash
pip install -e .
```

## CLI Usage

### 1) Merge datasets

```bash
datakit merge /path/ds1 /path/ds2 --out /path/out
```

### 2) Merge classes

```bash
datakit merge-classes /path/dataset --from Backpack Backpacks --to bag
```

### 3) Remap classes

```bash
datakit remap /path/dataset --names bag person --map 0:0 1:0 2:1
```

Remap safety behavior:
- validates that all mapped target IDs are within length of given class range
- pre-scans all label files to ensure every class ID has a mapping before writing
- only writes labels and `data.yaml` after validation succeeds

### 4) Visualize samples

```bash
datakit visualize --images-dir /path/dataset/val/images --labels-dir /path/dataset/val/labels --n 12 --seed 1
```

## Python API

```python
from datakit import merge_datasets, merge_classes, remap_dataset, plot_random_samples

merge_datasets(["/path/ds1", "/path/ds2"], "/path/out")
merge_classes("/path/dataset", ["Backpack", "Backpacks"], "bag")
remap_dataset("/path/dataset", ["bag", "person"], {0: 0, 1: 0, 2: 1})
plot_random_samples("/path/dataset/val/images", "/path/dataset/val/labels", n=12, seed=1)
```

