Metadata-Version: 2.4
Name: folderops
Version: 1.0.0
Summary: Lightweight utilities for organizing image datasets: split, merge, label-based sorting, and directory structure creation
Author: Ahamed
License: MIT
Keywords: dataset,images,ml,data-preprocessing,split folder,pip install folders
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Dynamic: license-file

# folderops

A lightweight Python package for organizing image datasets in machine learning workflows.

It focuses on the most common and repetitive tasks: splitting datasets, merging folders, structuring directories, and organizing data from labels. Everything is designed for direct use inside notebooks and research pipelines with minimal friction.

---

## Why folderops

If you’ve worked with vision datasets, you’ve probably rewritten the same scripts over and over:

- splitting train/val/test
- merging datasets from different sources
- reorganizing files from CSV labels
- creating directory structures manually

This package removes that overhead and gives you reliable, reusable utilities.

---

## Features

- Split datasets into train / validation / test sets
- Merge files from nested directories into a single folder
- Organize images into class folders using CSV labels
- Create directory structures from lists or nested dictionaries
- Supports common image formats:  
  `.jpg`, `.jpeg`, `.png`, `.bmp`, `.gif`, `.tif`, `.tiff`, `.webp`
- Works consistently in terminal, VS Code, and Jupyter notebooks

---

## Installation

```bash
pip install folderops
```

For development:

```bash
pip install -e .
```

---

## Quick Start

```python
from folderops import split_dataset, merge_folders, organize_by_labels, create_structure

split_dataset(
    source="images",
    output="dataset",
    train_ratio=0.7,
    val_ratio=0.15,
    test_ratio=0.15,
    seed=42,
)

merge_folders(
    source="dataset/images",
    output="merged_images",
)

organize_by_labels(
    image_dir="images",
    label_file="labels.csv",
    output="organized_dataset",
)

structure = {
    "dataset": {
        "train": {},
        "val": {},
        "test": {}
    }
}

create_structure(structure)
```

---

## API Reference

### split_dataset

Split a dataset organized by class folders into train, validation, and test sets.

#### Expected input structure

```
source/
    class1/
        img1.jpg
        img2.jpg
    class2/
        img3.jpg
```

#### Output structure

```
output/
    train/
        class1/
        class2/
    val/
        class1/
        class2/
    test/
        class1/
        class2/
```

#### Usage

```python
split_dataset(
    source="images",
    output="dataset",
    train_ratio=0.7,
    val_ratio=0.15,
    test_ratio=0.15,
    seed=42,
    mode="copy",
    extensions=(".jpg", ".png"),
)
```

#### Key behavior

- Splits per class, not globally
- Shuffles files before splitting
- Supports deterministic splits via seed
- Supports both `copy` and `move`
- Validates that ratios sum to 1.0
- Displays progress cleanly in both terminal and notebooks

---

### merge_folders

Merge all files from a directory (including subfolders) into a single folder.

#### Example

```
source/
    cats/
        a.jpg
    dogs/
        a.jpg
        b.jpg
```

#### Result

```
merged/
    a.jpg
    a_1.jpg
    b.jpg
```

#### Usage

```python
merge_folders(
    source="source",
    output="merged",
    mode="copy",
)
```

#### Key behavior

- Recursively scans all subfolders
- Prevents overwriting using automatic renaming
- Preserves all files
- Supports extension filtering

---

### organize_by_labels

Organize images into class folders using a CSV file.

#### CSV format

```csv
path,class
img1.jpg,cats
img2.jpg,dogs
```

#### Usage

```python
organize_by_labels(
    image_dir="images",
    label_file="labels.csv",
    output="organized",
    mode="copy",
)
```

#### Result

```
organized/
    cats/
        img1.jpg
    dogs/
        img2.jpg
```

#### Key behavior

- Validates every file exists before transfer
- Raises clear errors for missing or invalid entries
- Supports custom delimiters
- Optional strict extension filtering

---

### create_structure

Create directory structures from a list or nested dictionary.

#### List-based usage

```python
paths = ["train/cats", "train/dogs", "val/cats"]
create_structure(paths, root="dataset")
```

#### Dictionary-based usage

```python
structure = {
    "dataset": {
        "train": {},
        "val": {},
        "test": {}
    }
}

create_structure(structure)
```

#### Result

```
dataset/
    train/
    val/
    test/
```

#### Key behavior

- Accepts both flat lists and nested dictionaries
- Automatically creates missing directories
- Safe to run multiple times

---

## Project Structure

```
folderops/
├── folderops/
│   ├── __init__.py
│   ├── merger.py
│   ├── organizer.py
│   ├── splitter.py
│   ├── structure.py
│   └── utils.py
├── LICENSE
├── pyproject.toml
└── README.md
```

---

## Design Principles

- Minimal dependencies
- Explicit, readable APIs
- Safe file operations
- Notebook-friendly behavior
- Reproducible dataset handling

---

## Build and Publish

```bash
python -m build
```

```bash
twine upload dist/*
```

---

## Requirements

- Python 3.8+
- tqdm

---

## License

MIT License
