Metadata-Version: 2.4
Name: pyvisionauto
Version: 0.1.4
Summary: PyVisionAuto: Cross-platform desktop automation toolkit with visual image matching, mouse/keyboard control, and screen recording
Author: PyVisionAuto contributors
License-Expression: LicenseRef-Proprietary
Project-URL: Homepage, https://pypi.org/project/pyvisionauto/
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: Microsoft :: Windows
Classifier: Environment :: X11 Applications
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: license.dat
Requires-Dist: opencv-python>=4.8.0
Requires-Dist: mss>=9.0.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: pyautogui>=0.9.54
Requires-Dist: pillow>=10.0.0
Provides-Extra: test
Requires-Dist: pytest>=8.0; extra == "test"
Dynamic: license-file

# PyVisionAuto

[![PyPI version](https://img.shields.io/pypi/v/pyvisionauto.svg)](https://pypi.org/project/pyvisionauto/)
![Python](https://img.shields.io/pypi/pyversions/pyvisionauto)
![Platform](https://img.shields.io/badge/platform-Linux%20%7C%20Windows-blue)

**PyVisionAuto** is an end-to-end desktop automation toolkit.
It is centered on visual image matching and also includes screen recording, mouse automation, and keyboard automation capabilities.

## Scope

- Linux (X11 session) and Windows
- Real physical display required

## Install

```bash
pip install pyvisionauto
```

## System dependencies

### Linux

- **python3-tk** — Required for border overlay highlight
- **xdotool** — Preferred for window activation
- **wmctrl** — Fallback for window activation
- **ffmpeg** — Required for screen recording; install via `sudo apt install ffmpeg`

### Windows

- **tkinter** — Bundled with most Python installations
- **ffmpeg** — Required for screen recording; download from [ffmpeg.org](https://ffmpeg.org/download.html), extract archive, and add the `bin` folder to system `PATH`

### Verify ffmpeg installation

```bash
# Check if ffmpeg is installed and accessible
ffmpeg -version
```

> **Note:** Screen recording (via `Recorder` API) requires ffmpeg. On Linux, it uses `x11grab` codec; on Windows, it uses `gdigrab` codec. Both are built into ffmpeg by default.

## Quick start

### Basic usage: Find and click

```python
from pyvisionauto import Screen

screen = Screen()
# Wait for image to appear on screen, highlight it, then click
screen.wait("login_button.png", timeout=10).highlight().click()
```

### Advanced example: Record automation with screen capture

This example demonstrates screen recording combined with visual automation:

```python
from pyvisionauto import Screen, Recorder
from pathlib import Path

screen = Screen()
recorder = Recorder()

recorder.start_recording(output_path=Path("automation_demo.mp4"))
try:
    screen.activate_window("Calculator")
    screen.wait("button_1.png", timeout=10).highlight().click()
    screen.click("button_plus.png", timeout=5)
    screen.type_text("5")
    screen.wait("button_equals.png", timeout=5).highlight().click()
    screen.wait("result_7.png", timeout=3).highlight()
finally:
    recorder.stop_recording()
```

### Activate a window before matching

```python
screen.activate_window("Calculator")
screen.click("button.png")
```

## Runtime screenshot

Highlighted match region during runtime:

![PyVisionAuto runtime screenshot with highlighted region](screenshot.png)

## Platform differences

| Feature | Linux | Windows |
|---|---|---|
| Screen capture & template matching | Supported | Supported |
| Mouse / keyboard automation | Supported | Supported |
| Highlight overlay | Supported | Supported |
| Window activation | xdotool / wmctrl | pyautogui (pygetwindow) |
| Screen recording | ffmpeg + x11grab | ffmpeg + gdigrab |

> Screen recording requires ffmpeg installed and added to system PATH. Linux uses `x11grab`, Windows uses `gdigrab`.

## Window focus on Linux (X11)

pyautogui uses XTest synthetic events to move the mouse and click. On X11, **synthetic pointer events do not trigger focus changes** — the window manager only reassigns focus in response to real hardware events. This means:

- `click()` moves the cursor to the correct coordinates and clicks, but the keyboard focus stays wherever it was before.
- Any subsequent keyboard action (`press()`, `type_text()`, hotkeys) is delivered to whichever window currently has focus — which may not be the window you just clicked.

**Rule of thumb: always call `activate_window()` before any keyboard action, targeting the exact window that should receive it.**

Use `xdotool` to find the precise window name while the application is running:

```bash
xdotool search --name "" 2>/dev/null | while read id; do
    printf "ID=%-12s %s\n" "$id" "$(xdotool getwindowname "$id" 2>/dev/null)"
done
```

Pick the shortest substring that uniquely identifies the target window and use it in `activate_window()`.

### Main window vs. dialogs

When a modal dialog is open, activate the dialog directly — do not activate the main window and rely on the WM to forward focus:

```python
from pyvisionauto import Screen
from rod_automation.automation.desktop_utils import bring_window_to_front

screen = Screen()

# --- Clicking a dialog image ---
# Activate the dialog BEFORE sending keyboard input to it.
# Without this, ESC/Enter goes to whichever window had focus before.
screen.wait("open_project_dialog.png", timeout=30).highlight().click()
bring_window_to_front("Open Project")   # activate the dialog, not the main window
screen.input.press("esc")               # now ESC is reliably delivered to the dialog

# --- Clicking main-window controls ---
bring_window_to_front("My App 2026")   # activate the main window
screen.wait("toolbar_button.png", timeout=10).click()
```

> **Why not just activate the main window?**
> On GNOME/Mutter, activating the main window does propagate focus to a modal child dialog — but this is WM-specific behaviour. Activating the dialog directly is explicit, portable, and not dependent on WM modal-focus rules.

## Notes

- Wayland-only and headless environments are not currently supported.
- On Windows with high-DPI scaling, coordinate accuracy may be affected.
