Metadata-Version: 2.4
Name: nano-wait-vision
Version: 0.3.1
Summary: Vision extension for nano-wait automation
Author: Luiz Seabra De Marco
License: MIT
Project-URL: Repository, https://github.com/LuizSeabraDeMarco/NanoWaitVision
Classifier: Programming Language :: Python :: 3
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Requires-Python: >=3.9
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: nano-wait
Requires-Dist: opencv-python
Requires-Dist: pytesseract
Requires-Dist: pyautogui
Requires-Dist: numpy
Dynamic: license-file

# Nano-Wait-Vision — Visual Execution Extension

[![PyPI version](https://img.shields.io/pypi/v/nano-wait-vision.svg)](https://pypi.org/project/nano-wait-vision/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

**nano-wait-vision** is the official computer vision extension for [nano-wait](https://pypi.org/project/nano-wait/). It integrates visual awareness (OCR, icon detection, screen states) into the adaptive waiting engine, enabling deterministic, screen-driven automations.

> [!IMPORTANT]
> **Critical Dependency:** This package **DEPENDS** on `nano-wait`. It does not replace `nano-wait` — it extends it.

---

## 🧠 What is Nano-Wait-Vision?

Nano-Wait-Vision is a deterministic vision engine for Python automation. Instead of waiting blindly with `sleep()`, it allows your code to wait for real visual conditions:

*   **Text** appearing on screen
*   **Icons** becoming visible
*   **UI states** changing

It is designed to work in strict cooperation with `nano-wait`:

| Component | Responsibility |
| :--- | :--- |
| ⏱️ **nano-wait** | When to check (adaptive pacing & CPU-aware waiting) |
| 👁️ **nano-wait-vision** | What to check (screen, OCR, icons) |

---

## 🧩 Key Features

nano-wait-vision extends nano-wait with:

*   **👁️ OCR (Optical Character Recognition):** Read real text directly from the screen.
*   **🖼️ Icon Detection:** Template matching via OpenCV.
*   **🖥️ Automatic HiDPI/Retina Support:** Icons and template matching are automatically scaled to work flawlessly on 4K, macOS Retina, and Windows HiDPI displays, requiring zero user configuration.
*   **🧠 Explicit Visual States:** Each operation returns a structured `VisionState`.
*   **📚 Persistent & Explainable Diagnostics:** No black-box ML models.
*   **⚡ QA-Friendly & Plug-and-Play:** Zero dependency on web drivers (like Selenium), making corporate and academic adoption seamless.
*   **🖥️ Screen-Based Automation:** Ideal for RPA and GUI testing.

> [!TIP]
> All waiting logic is delegated to `nano-wait.wait()` — never `time.sleep()`.

---

## 🚀 Quick Start

### Installation

```bash
pip install nano-wait
pip install nano-wait-vision
```

### Simple Visual Observation

```python
from nano_wait_vision import VisionMode

vision = VisionMode()
state = vision.observe()

print(f"Detected: {state.detected}")
print(f"Text: {state.text}")
```

### Wait for Text to Appear

```python
from nano_wait_vision import VisionMode

vision = VisionMode(verbose=True)

# Wait up to 10 seconds for the word "Welcome"
state = vision.wait_text("Welcome", timeout=10)

if state.detected:
    print("Text detected!")
```

### Wait for an Icon

```python
from nano_wait_vision import VisionMode

vision = VisionMode()

# Wait up to 10 seconds for an icon image
state = vision.wait_icon("ok.png", timeout=10)

if state.detected:
    print("Icon found on screen.")
```

---

## ⚠️ Installation & Dependencies

This library interacts directly with your operating system screen and OCR engine.

### Python Dependencies (auto-installed)
*   `opencv-python`
*   `pytesseract`
*   `pyautogui`
*   `numpy`

### 🧠 Mandatory External Dependency — Tesseract OCR

OCR will not work unless **Tesseract** is installed and available in your PATH.

| OS | Command / Action |
| :--- | :--- |
| **macOS** | `brew install tesseract` |
| **Ubuntu / Debian** | `sudo apt install tesseract-ocr` |
| **Windows** | Download from the official Tesseract repo and add to PATH |

> [!WARNING]
> If Tesseract is missing, OCR calls will silently fail or return empty text.

---

## 🧠 Mental Model — How It Works

Nano-Wait-Vision follows this loop: **observe → evaluate → wait → observe**.

Two engines cooperate:

| 👁️ Vision Engine | ⏱️ nano-wait |
| :--- | :--- |
| OCR / Icons | Adaptive timing |
| Screen capture | CPU-aware waits |
| Visual states | Smart pacing |

Vision never sleeps. All delays are handled by `nano-wait`.

---

## 📦 VisionState — Return Object

Every visual operation returns a `VisionState` object:

```python
VisionState(
    name: str,
    detected: bool,
    confidence: float,
    attempts: int,
    elapsed: float,
    text: Optional[str],
    icon: Optional[str],
    diagnostics: dict
)
```

*Always check `detected` before acting on the result.*

---

## 🧪 Diagnostics & Debugging

Nano-Wait-Vision supports verbose diagnostics:

```python
vision = VisionMode(verbose=True)
state = vision.wait_text("Terminal")
```

Diagnostics include:
*   Attempts per phase
*   Confidence scores
*   Elapsed time
*   Reason for failure

A full macOS diagnostic test is provided in `test_screen.py`, generating debug screenshots for inspection.

---

## 🖥️ Platform Notes

### Automatic HiDPI/Retina Support (New!)

The library now automatically detects the screen's scaling factor (DPI/Retina) and scales icon templates accordingly. This ensures that template matching works reliably on all modern displays (macOS Retina, Windows HiDPI, 4K monitors) without any manual configuration or code changes from the user.

### macOS (Important)
*   Screen capture requires **Screen Recording permission**.
*   OCR requires RGB images (internally handled by Nano-Wait-Vision).
*   Fully tested on macOS Retina displays with automatic scaling.

### Windows & Linux
*   Works out of the box.

---

## 🧪 Ideal Use Cases

Use Nano-Wait-Vision when dealing with:
*   **RPA** (Robotic Process Automation)
*   **GUI automation** and testing
*   **OCR-driven** workflows
*   **Visual regression** tests
*   Applications **without APIs**
*   Screen-based alternatives to traditional web drivers.

---

## 🧩 Design Philosophy

*   **Deterministic:** Predictable behavior based on visual truth.
*   **Explainable:** Clear diagnostics for every action.
*   **No opaque ML:** Uses reliable computer vision techniques.
*   **System-aware:** Respects system resources via `nano-wait`.
*   **Debuggable by design:** Built-in tools for troubleshooting.

---

## 🧪 QA & Automation Adapters (Pytest & Generic Wait)

The library is now completely **driver-agnostic** and provides dedicated tools for QA and automation workflows.

### Generic Visual Waits (`VisionWait`)

The `VisionWait` class provides a "Selenium-like" adapter for visual waiting, but is now completely independent of Selenium or any web driver. It's a clean, plug-and-play way to integrate visual checks into any automation framework.

```python
from nano_wait_vision import VisionWait

# VisionWait is now a generic adapter, not tied to Selenium
wait = VisionWait(timeout=15) 
wait.until_text("Dashboard")
wait.until_icon("ok.png")
```

### Pytest Fixtures (Plug-and-Play)

For immediate adoption in QA projects, the library provides ready-to-use pytest fixtures.

```python
# In your conftest.py or test file
# Fixtures 'vision' and 'wait' are automatically available

def test_homepage(vision, wait):
    # Use the global VisionMode instance
    assert vision.wait_text("Welcome") 
    
    # Use the VisionWait adapter
    wait.until_icon("login_button.png")
```
*Fixtures are available via `nano_wait_vision.pytest_fixture`.*

---

## 📄 License

This project is licensed under the MIT License.
