Metadata-Version: 2.4
Name: psynx-widget-detector
Version: 0.1.0
Summary: End-to-end document widget detection pipeline using YOLO11 on CommonForms dataset
Requires-Python: >=3.11
Requires-Dist: albumentations>=1.4.0
Requires-Dist: click>=8.1.0
Requires-Dist: datasets>=2.18.0
Requires-Dist: huggingface-hub>=0.22.0
Requires-Dist: numpy>=1.26.0
Requires-Dist: opencv-python-headless>=4.9.0
Requires-Dist: pillow>=10.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pymupdf>=1.24.0
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: rich>=13.7.0
Requires-Dist: torch<2.7.0,>=2.5.0
Requires-Dist: torchvision<0.22.0,>=0.20.0
Requires-Dist: tqdm>=4.66.0
Requires-Dist: ultralytics>=8.3.0
Provides-Extra: dev
Requires-Dist: pytest-cov>=5.0.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.4.0; extra == 'dev'
Description-Content-Type: text/markdown

# Widget Detection Pipeline

End-to-end document form widget detection using **YOLO11m** trained on the [CommonForms](https://huggingface.co/datasets/jbarrow/CommonForms) dataset.

Detects 3 classes of form fields from scanned PDFs and document images:

| Class ID | Name | Description |
|---|---|---|
| 0 | `text_input` | Text boxes, input lines |
| 1 | `choice_button` | Checkboxes + radio buttons |
| 2 | `signature` | Signature fields |

---

## Requirements

- Python 3.11+
- [uv](https://docs.astral.sh/uv/) (`pip install uv`)
- CUDA GPU with ≥ 12 GB VRAM (RTX 3080Ti / A2000 12GB / etc.) — training at 1024px with batch=4

---

## Setup

```bash
# 1. Install uv if not already installed
pip install uv

# 2. Create venv and install all dependencies
uv sync

# 3. (Optional) Install dev dependencies for testing/linting
uv sync --extra dev
```

---

## Pipeline

### Step 1 — Download Dataset (CommonForms subset)

Streams 50,000 images from HuggingFace (no full 163GB download needed):

```bash
uv run scripts/download_dataset.py --max-images 50000
```

Options:
- `--max-images N`   — number of images (default: 50,000)
- `--token HF_TOKEN` — HuggingFace token if needed
- `--seed 42`        — reproducibility seed

Output: `data/raw/images/` + `data/raw/annotations/`

---

### Step 2 — Convert to YOLO Format

```bash
uv run scripts/convert_to_yolo.py
```

Options:
- `--val-ratio 0.1`  — validation split (default: 10%)
- `--seed 42`

Output: `data/yolo/` with `images/`, `labels/`, `data.yaml`

---

### Step 3 — Verify Dataset

```bash
# Check integrity
uv run scripts/verify_dataset.py

# Visual inspection (draws 20 sample images with bboxes)
uv run scripts/verify_dataset.py --draw-samples 20
```

---

### Step 4 — Train

```bash
# Full training (100 epochs, batch=4, 1024px)
uv run train.py --config configs/train_config.yaml

# Smoke test (3 epochs, quick sanity check)
uv run train.py --config configs/train_config.yaml --smoke-test

# Resume from last checkpoint
uv run train.py --config configs/train_config.yaml --resume
```

Training output: `runs/detect/widget_yolo11m/`

---

### Step 5 — Run Inference

```bash
# PDF input → JSON output
uv run inference.py \
    --input form.pdf \
    --model runs/detect/widget_yolo11m/weights/best.pt

# Image input with lower confidence threshold
uv run inference.py \
    --input scan.jpg \
    --model best.pt \
    --conf 0.2

# Batch of PDFs with visual overlay
uv run inference.py \
    --input "forms/*.pdf" \
    --model best.pt \
    --visualize \
    --output-dir outputs/

# High DPI for dense forms
uv run inference.py --input form.pdf --model best.pt --dpi 300
```

---

## Output Format

```json
{
  "source": "form.pdf",
  "total_pages": 3,
  "total_widgets": 24,
  "pages": [
    {
      "source": "form.pdf",
      "page": 1,
      "image_width": 1654,
      "image_height": 2339,
      "processing_time_ms": 142.3,
      "widgets": [
        {
          "class_id": 0,
          "class_name": "text_input",
          "confidence": 0.913,
          "bbox": {
            "x1": 120.0, "y1": 340.0, "x2": 480.0, "y2": 380.0,
            "x1_norm": 0.073, "y1_norm": 0.145,
            "x2_norm": 0.290, "y2_norm": 0.163
          },
          "page": 1
        }
      ]
    }
  ]
}
```

---

## Run Tests

```bash
uv run pytest tests/ -v
```

---

## Training Config Highlights (12 GB GPU)

| Parameter | Value | Reason |
|---|---|---|
| `imgsz` | 1024 | Small widget detection needs high resolution |
| `batch` | 4 | Safe for 12 GB VRAM at 1024px |
| `amp` | true | Mixed precision — reduces VRAM ~40% |
| `epochs` | 100 | With early stopping (patience=20) |
| `degrees` | 10.0 | Rotation for skewed scans |
| `perspective` | 0.0005 | Real-world document distortion |
| `mosaic` | 1.0 | Key augmentation for small widgets |
| `albumentations` | auto | Blur + noise when installed |

---

## Project Structure

```
Widget_detection1/
├── widget_detector/          # Core library
│   ├── config.py             # Paths, class maps, defaults
│   ├── dataset.py            # HF download + YOLO conversion
│   ├── detector.py           # WidgetDetector inference class
│   ├── output.py             # Pydantic result models
│   ├── pdf_utils.py          # PDF → PIL images (PyMuPDF)
│   └── trainer.py            # Training wrapper
├── scripts/
│   ├── download_dataset.py   # Step 1: Download
│   ├── convert_to_yolo.py    # Step 2: Convert
│   └── verify_dataset.py     # Step 3: Verify
├── configs/
│   └── train_config.yaml     # YOLO11m hyperparameters
├── train.py                  # Training entry point
├── inference.py              # Inference entry point
├── tests/                    # Unit tests
└── pyproject.toml            # uv project manifest
```

---

## Notes

- **CommonForms `choice_button`** includes both checkboxes and radio buttons as one class (the dataset does not distinguish them). If you need to split them, a heuristic post-processor can be added based on bbox aspect ratio.
- Training is set to **3 classes** (`text_input`, `choice_button`, `signature`) matching CommonForms exactly.
- The `data/` and `runs/` directories are gitignored — do not commit them.
