Metadata-Version: 2.4
Name: psynx-widget-detector
Version: 0.1.2
Summary: End-to-end document widget detection pipeline using YOLO11 on CommonForms dataset
Requires-Python: >=3.11
Requires-Dist: albumentations>=1.4.0
Requires-Dist: click>=8.1.0
Requires-Dist: datasets>=2.18.0
Requires-Dist: huggingface-hub>=0.22.0
Requires-Dist: numpy>=1.26.0
Requires-Dist: opencv-python-headless>=4.9.0
Requires-Dist: pillow>=10.0.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pymupdf>=1.24.0
Requires-Dist: pyyaml>=6.0.0
Requires-Dist: rich>=13.7.0
Requires-Dist: torch<2.7.0,>=2.5.0
Requires-Dist: torchvision<0.22.0,>=0.20.0
Requires-Dist: tqdm>=4.66.0
Requires-Dist: ultralytics>=8.3.0
Provides-Extra: dev
Requires-Dist: pytest-cov>=5.0.0; extra == 'dev'
Requires-Dist: pytest>=8.0.0; extra == 'dev'
Requires-Dist: ruff>=0.4.0; extra == 'dev'
Description-Content-Type: text/markdown

# Widget Detector

End-to-end document form widget detection using **YOLO11m**, automatically downloading fine-tuned weights from Hugging Face.

Detects 3 classes of form fields from scanned PDFs and document images:

| Class ID | Name | Description |
|---|---|---|
| 0 | `text_input` | Text boxes, input lines |
| 1 | `choice_button` | Checkboxes + radio buttons |
| 2 | `signature` | Signature fields |

---

## Installation

You can install the package directly from PyPI:

```bash
pip install psynx-widget-detector
```

*Requires Python 3.11+*

---

## Quickstart

The package will **automatically download** the fine-tuned YOLO11m weights from Hugging Face (`PSynx/widget-detector-yolo`) the first time you run it.

```python
from widget_detector import WidgetDetector

# 1. Initialize the detector (downloads weights automatically if not found)
detector = WidgetDetector()

# 2. Run inference on a PDF (auto-renders pages to images)
result = detector.detect_path("sample_form.pdf")

# 3. Print the results
print(f"Detected {result.total_widgets} widgets across {result.total_pages} pages.")

for page in result.pages:
    print(f"\nPage {page.page}:")
    for widget in page.widgets:
        print(f" - {widget.class_name} ({widget.confidence:.2f}) at "
              f"[{widget.bbox.x1:.1f}, {widget.bbox.y1:.1f}, {widget.bbox.x2:.1f}, {widget.bbox.y2:.1f}]")

# 4. Save results to JSON
result.save("output.json")
```

---

## Output Format

The detector returns a structured Pydantic object that cleanly serializes to JSON:

```json
{
  "source": "form.pdf",
  "total_pages": 3,
  "total_widgets": 24,
  "pages": [
    {
      "source": "form.pdf",
      "page": 1,
      "image_width": 1654,
      "image_height": 2339,
      "processing_time_ms": 142.3,
      "widgets": [
        {
          "class_id": 0,
          "class_name": "text_input",
          "confidence": 0.913,
          "bbox": {
            "x1": 120.0, "y1": 340.0, "x2": 480.0, "y2": 380.0,
            "x1_norm": 0.073, "y1_norm": 0.145,
            "x2_norm": 0.290, "y2_norm": 0.163
          },
          "page": 1
        }
      ]
    }
  ]
}
```

---

## Notes

- **CommonForms `choice_button`** includes both checkboxes and radio buttons as one class (the dataset does not distinguish them). 
- **Inference Speed:** If you have a CUDA-enabled GPU, the `WidgetDetector` will automatically use it for highly accelerated inference.
