Metadata-Version: 2.4
Name: coldet
Version: 0.1.1
Summary: Constraint-based CNN generator and trainer. Describe what you need — not how to build it.
License: MIT
Keywords: deep-learning,computer-vision,cnn,pytorch,constraint-based
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: torch>=2.0
Requires-Dist: numpy>=1.24
Requires-Dist: opencv-python>=4.8
Provides-Extra: dev
Requires-Dist: pytest>=7.0; extra == "dev"
Requires-Dist: torchvision>=0.15; extra == "dev"

# coldet

> **Describe constraints, not networks.**

coldet is a minimal Python package for building and training image classification models without touching a single line of architecture code. You express *what you need* — size, speed, accuracy tradeoff — and the system generates a valid CNN for you.

---

## Why constraint-based modelling?

Traditional ML frameworks ask you to design networks: choose a backbone, wire up a neck, attach a head, pick kernel sizes, set strides. This works when you have expertise and time. Most of the time, you just need a model that is *fast enough* and *accurate enough* for your problem.

coldet flips the question. Instead of:

> "I want ResNet-50 with a FPN neck and a 4-layer detection head…"

You say:

> "I want something medium-sized that leans toward speed."

The architecture is derived automatically from those constraints. You never see layer names, channel counts, or backbone identifiers — because they are not your concern.

**Benefits:**
- No architecture expertise required.
- No mismatch between intent and implementation.
- Constraints are reproducible: same inputs → same model.
- Easier to reason about tradeoffs (`speed_bias=0.9` is faster than `speed_bias=0.3` — always).

---

## Installation

```bash
pip install coldet
```

Or from source:

```bash
git clone https://github.com/yourorg/coldet
cd coldet
pip install -e .
```

---

## Dataset format

coldet discovers classes automatically from your folder structure. Each subfolder becomes a class; the subfolder name is the class label.

```
dataset/
    red/
        img001.jpg
        img002.png
        ...
    blue/
        img001.jpg
        img002.png
        ...
    green/
        ...
```

Any number of classes is supported. Supported image formats are the same as `torchvision.datasets.ImageFolder` (`.jpg`, `.jpeg`, `.png`, `.ppm`, `.bmp`, `.tiff`, `.webp`).

---

## Quickstart

### Dataset-driven (recommended)

```python
import coldet

# 1. Point at your dataset — classes are discovered automatically
trainer = coldet.Trainer(
    size="medium",       # "small" | "medium" | "large" | "nano"
    speed_bias=0.7,      # 0 = prefer accuracy, 1 = prefer speed
    dataset="dataset/",  # subfolder-per-class layout
    batch_size=32,
    image_size=224,
)
# [coldet] dataset 'dataset/' — 3 classes discovered: ['blue', 'green', 'red']

# 2. Train — no DataLoader needed
trainer.train(epochs=10)

# 3. Write sum.md + graph.png to current directory
trainer.summary()

# 4. Save and restore
trainer.save("checkpoints/run1.pt")
trainer.load("checkpoints/run1.pt")

# 5. Run inference
import torch
images = torch.rand(4, 3, 224, 224)
logits = trainer.predict(images)   # shape (4, num_classes)
```

### Manual DataLoader

```python
import coldet
from torch.utils.data import DataLoader

trainer = coldet.Trainer(size="medium", speed_bias=0.7, num_classes=3)
trainer.train(my_dataloader, epochs=10)
trainer.summary()
```

---

## API Reference

### `coldet.load_dataset`

```python
loader, class_names = coldet.load_dataset(
    dataset_path,        # root folder (one subfolder per class)
    image_size=224,      # resize images to this square size
    batch_size=32,
    shuffle=True,
    num_workers=0,
)
```

Returns a `(DataLoader, list[str])` tuple. The list is sorted alphabetically and maps integer labels to class names. Called internally by `Trainer` when `dataset=` is set.

---

### `coldet.Trainer`

```python
Trainer(
    size="small",           # capacity tier
    speed_bias=0.5,         # float [0, 1]
    accuracy_bias=None,     # float [0, 1] — defaults to 1 - speed_bias
    dataset=None,           # path to subfolder-per-class dataset
    image_size=224,         # image resize target (used with dataset=)
    batch_size=32,          # batch size (used with dataset=)
    num_classes=1000,       # ignored when dataset= is supplied
    lr=1e-3,
    device=None,            # auto-detected (CUDA if available)
)
```

| Method | Description |
|--------|-------------|
| `train(dataloader=None, epochs=1, verbose=True)` | Train for N epochs. `dataloader` is optional when `dataset=` was set at construction. Returns list of epoch losses. |
| `summary(output_dir=".")` | Writes `sum.md` and `graph.png` to `output_dir`. Returns info dict (see below). |
| `predict(images)` | Forward pass (eval mode, no grad). Returns logits. |
| `save(path)` | Save full checkpoint (weights + optimiser state + constraints + history). |
| `load(path)` | Restore from checkpoint. |

#### `summary()` return value

```python
{
    'size_label': 'medium',
    'total_params': 214_832,
    'trainable_params': 214_832,
    'model_size_mb': 0.859,
    'speed_tier': 'balanced',
    'trained_epochs': 10,
    'num_classes': 3,
    'class_names': ['blue', 'green', 'red'],
}
```

#### `summary()` output files

| File | Contents |
|------|----------|
| `sum.md` | Markdown table of constraints, parameter counts, class list, and training history (best loss, final loss). |
| `graph.png` | Training loss curve plotted over epochs. Written only when at least one epoch has been trained. |

---

### `coldet.build_model`

Lower-level function if you need the raw `nn.Module`:

```python
model = coldet.build_model(
    size="large",
    speed_bias=0.3,
    accuracy_bias=0.8,
    num_classes=10,
)
```

Returns a `torch.nn.Module`. Architecture is generated — not predefined.

---

## Constraint guide

| Goal | Recommended settings |
|------|----------------------|
| Edge / mobile deployment | `size="small"`, `speed_bias=0.9` |
| Balanced production model | `size="medium"`, `speed_bias=0.5` |
| Maximum accuracy, offline | `size="large"`, `speed_bias=0.1`, `accuracy_bias=0.9` |
| Rapid prototyping | `size="small"`, `speed_bias=0.7` |

---

## What is hidden (intentionally)

coldet deliberately does not expose:

- Layer names, channel counts, or kernel sizes.
- Backbone / neck / head terminology.
- Depthwise vs standard convolution choices.
- Dropout values, batch norm parameters.

These are implementation details derived from your constraints. Exposing them would reintroduce the complexity the package is designed to remove.

---

## Philosophy

> **Describe constraints, not networks.**

Good abstractions hide decisions that don't belong to the user. Network architecture is an implementation concern — it should be derived from requirements, not specified by them. coldet treats the model as an *output* of a constraint-solving process, not an *input* from the user.

This is the same idea behind high-level languages (you describe *what* to compute, not *how* to use registers), infrastructure-as-intent tools, and compiler optimisation: raise the level of abstraction until the user operates at the level of their actual problem.

---

## License

MIT
