Metadata-Version: 2.4
Name: pyvisionauto
Version: 0.1.5
Summary: PyVisionAuto: Cross-platform desktop automation toolkit with visual image matching, mouse/keyboard control, and screen recording
Author: PyVisionAuto contributors
License-Expression: LicenseRef-Proprietary
Project-URL: Homepage, https://pypi.org/project/pyvisionauto/
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Operating System :: POSIX :: Linux
Classifier: Operating System :: Microsoft :: Windows
Classifier: Environment :: X11 Applications
Classifier: Topic :: Software Development :: Testing
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
License-File: license.dat
Requires-Dist: opencv-python>=4.8.0
Requires-Dist: mss>=9.0.0
Requires-Dist: numpy>=1.24.0
Requires-Dist: pyautogui>=0.9.54
Requires-Dist: pillow>=10.0.0
Provides-Extra: test
Requires-Dist: pytest>=8.0; extra == "test"
Dynamic: license-file

# PyVisionAuto

[![PyPI version](https://img.shields.io/pypi/v/pyvisionauto.svg)](https://pypi.org/project/pyvisionauto/)
![Python](https://img.shields.io/pypi/pyversions/pyvisionauto)
![Platform](https://img.shields.io/badge/platform-Linux%20%7C%20Windows-blue)

**PyVisionAuto** is an end-to-end desktop automation toolkit.
It is centered on visual image matching and also includes screen recording, mouse automation, and keyboard automation capabilities.

## Scope

- Linux (X11 session) and Windows
- Real physical display required

## Install

```bash
pip install pyvisionauto
```

## System dependencies

### Linux

- **python3-tk** — Required for border overlay highlight
- **xdotool** — Preferred for window activation
- **wmctrl** — Fallback for window activation
- **ffmpeg** — Required for screen recording; install via `sudo apt install ffmpeg`

### Windows

- **tkinter** — Bundled with most Python installations
- **ffmpeg** — Required for screen recording; download from [ffmpeg.org](https://ffmpeg.org/download.html), extract archive, and add the `bin` folder to system `PATH`

### Verify ffmpeg installation

```bash
# Check if ffmpeg is installed and accessible
ffmpeg -version
```

> **Note:** Screen recording (via `Recorder` API) requires ffmpeg. On Linux, it uses `x11grab` codec; on Windows, it uses `gdigrab` codec. Both are built into ffmpeg by default.

## Quick start

### Basic usage: Find and click

```python
from pyvisionauto import Screen

screen = Screen()
# Wait for image to appear on screen, highlight it, then click
screen.wait("login_button.png", timeout=10).highlight().click()
```

### Advanced example: Record automation with screen capture

This example demonstrates screen recording combined with visual automation:

```python
from pyvisionauto import Screen, Recorder
from pathlib import Path

screen = Screen()
recorder = Recorder()

recorder.start_recording(output_path=Path("automation_demo.mp4"))
try:
    screen.activate_window("Calculator")
    screen.wait("button_1.png", timeout=10).highlight().click()
    screen.click("button_plus.png", timeout=5)
    screen.type_text("5")
    screen.wait("button_equals.png", timeout=5).highlight().click()
    screen.wait("result_7.png", timeout=3).highlight()
finally:
    recorder.stop_recording()
```

### Activate a window before matching

```python
screen.activate_window("Calculator")
screen.click("button.png")
```

## Runtime screenshot

Highlighted match region during runtime:

![PyVisionAuto runtime screenshot with highlighted region](screenshot.png)

## Platform differences

| Feature | Linux | Windows |
|---|---|---|
| Screen capture & template matching | Supported | Supported |
| Mouse / keyboard automation | Supported | Supported |
| Highlight overlay | Supported | Supported |
| Window activation | xdotool / wmctrl | pyautogui (pygetwindow) |
| Screen recording | ffmpeg + x11grab | ffmpeg + gdigrab |

> Screen recording requires ffmpeg installed and added to system PATH. Linux uses `x11grab`, Windows uses `gdigrab`.

## Window focus on Linux (X11)

On X11 systems, **mouse clicks alone do not automatically change keyboard focus**. The window manager only reassigns focus in response to real hardware events or explicit window activation requests. This means:

- `click()` moves the cursor to the correct coordinates and clicks, but the keyboard focus stays wherever it was before.
- Any subsequent keyboard action (`press()`, `type_text()`, hotkeys) is delivered to whichever window currently has focus — which may not be the window you just clicked.

**Rule of thumb: always call `activate_window()` before any keyboard action, targeting the exact window that should receive it.**

Use `xdotool` to find the precise window name while the application is running:

```bash
xdotool search --name "" 2>/dev/null | while read id; do
    printf "ID=%-12s %s\n" "$id" "$(xdotool getwindowname "$id" 2>/dev/null)"
done
```

Pick the shortest substring that uniquely identifies the target window and use it in `activate_window()`.

### Main window vs. dialogs

When a modal dialog is open, activate the dialog directly — do not activate the main window and rely on the WM to forward focus:

```python
from pyvisionauto import Screen

screen = Screen()

# --- Interacting with a dialog ---
# 1. Wait for the dialog image to appear and click it
screen.wait("open_project_dialog.png", timeout=30).click()
# 2. Activate the dialog window so keyboard input goes to it
screen.activate_window("Open Project")   # activate the dialog, not the main window
# 3. Now keyboard actions are reliably delivered to the dialog
screen.input.press("esc")

# --- Interacting with the main window ---
screen.activate_window("My App 2026")
screen.wait("toolbar_button.png", timeout=10).click()
```

> **Why not just activate the main window?**
> On GNOME/Mutter, activating the main window does propagate focus to a modal child dialog — but this is WM-specific behaviour. Activating the dialog directly is explicit, portable, and not dependent on WM modal-focus rules.

### highlight() and focus

`highlight()` launches a temporary tkinter overlay window. On some window managers this overlay can briefly steal keyboard focus. To avoid side effects:

- Prefer `.click()` before `.highlight()`, not after — the API supports chaining in both directions.
- Do not rely on focus being intact after `.highlight()` returns; call `activate_window()` again if keyboard actions follow.

```python
# Safer pattern: click first, highlight after (for visual feedback only)
screen.wait("button.png", timeout=10).click().highlight()

# Risky pattern: highlight steals focus, click lands on wrong window
# screen.wait("button.png", timeout=10).highlight().click()  # avoid
```

## Notes

- Wayland-only and headless environments are not currently supported.
- On Windows with high-DPI scaling, coordinate accuracy may be affected.

## Acknowledgments

This project is inspired by [Sikulix](http://sikulix.com/) and built with:

- [OpenCV](https://opencv.org/) — Computer vision library for template matching
- [mss](https://github.com/BoboTiG/python-mss) — Fast, efficient screen capture
- [PyAutoGUI](https://github.com/asweigart/pyautogui) — Cross-platform mouse and keyboard automation
- [ffmpeg](https://ffmpeg.org/) — Multimedia framework for screen recording
