Metadata-Version: 2.4
Name: windows-gui-mcp
Version: 0.1.0
Summary: Windows GUI Automation MCP server for AI coding agents
Project-URL: Homepage, https://github.com/dcl632/windows-gui-mcp
Project-URL: Repository, https://github.com/dcl632/windows-gui-mcp
Project-URL: Issues, https://github.com/dcl632/windows-gui-mcp/issues
Author: Liao Ding Chao
License: MIT
License-File: LICENSE
Classifier: Development Status :: 3 - Alpha
Classifier: Environment :: Win32 (MS Windows)
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: User Interfaces
Requires-Python: >=3.12
Requires-Dist: anyio>=4.3.0
Requires-Dist: mcp>=1.2.0
Requires-Dist: pillow>=10.2.0
Requires-Dist: psutil>=5.9.0
Requires-Dist: pydantic>=2.6.0
Provides-Extra: all
Requires-Dist: comtypes>=1.4.0; (sys_platform == 'win32') and extra == 'all'
Requires-Dist: easyocr>=1.7.1; (python_version < '3.13') and extra == 'all'
Requires-Dist: opencv-python>=4.9.0; extra == 'all'
Requires-Dist: pyautogui>=0.9.54; extra == 'all'
Requires-Dist: pytesseract>=0.3.10; extra == 'all'
Requires-Dist: pywinauto>=0.6.8; (sys_platform == 'win32') and extra == 'all'
Requires-Dist: winocr>=0.0.14; (sys_platform == 'win32') and extra == 'all'
Provides-Extra: dev
Requires-Dist: build>=1.2.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Requires-Dist: twine>=5.0.0; extra == 'dev'
Provides-Extra: ocr
Requires-Dist: easyocr>=1.7.1; (python_version < '3.13') and extra == 'ocr'
Requires-Dist: opencv-python>=4.9.0; extra == 'ocr'
Requires-Dist: pytesseract>=0.3.10; extra == 'ocr'
Provides-Extra: windows
Requires-Dist: comtypes>=1.4.0; (sys_platform == 'win32') and extra == 'windows'
Requires-Dist: pyautogui>=0.9.54; extra == 'windows'
Requires-Dist: pywinauto>=0.6.8; (sys_platform == 'win32') and extra == 'windows'
Requires-Dist: winocr>=0.0.14; (sys_platform == 'win32') and extra == 'windows'
Description-Content-Type: text/markdown

# windows-gui-mcp

Windows GUI Automation MCP server for AI coding agents.

`windows-gui-mcp` helps agents operate Windows desktop applications through
semantic UI Automation instead of brittle coordinate clicks. It is designed for
agent workflows that need to inspect a live Windows UI, act on stable
identifiers, verify every action, and turn successful sessions into reusable
scripts.

## Why this exists

AI agents can work reliably with web pages because browsers expose structured
DOM state. Windows desktop applications are harder: the visible UI is often
stateful, asynchronous, and easy to break with raw coordinates.

This project exposes a small MCP toolset that keeps the agent in a safer loop:

1. Discover visible windows.
2. Focus the target window.
3. Dump the UI Automation tree.
4. Find controls by stable identifiers.
5. Act with post-action verification.
6. Use OCR or image fallback only after semantic lookup fails.
7. Generate a pywinauto replay script from the trace.

## Tooling model

```text
AI coding agent
      |
      | MCP stdio
      v
windows_gui_mcp.server
      |
      v
tools/dispatch + trace recorder
      |
      +-- window / element / input / verify / wait
      +-- screenshot / OCR / fallback / trace-to-script
      |
      v
Windows backend ladder
      |
      +-- pywinauto UIA      first choice
      +-- pywinauto win32    legacy fallback
      +-- pyautogui          image/coordinate last resort
```

## MCP tools

| Tool | Purpose |
| --- | --- |
| `list_windows` | Enumerate visible top-level windows. |
| `focus_window` | Bring a title-matching window to the foreground and verify focus. |
| `dump_ui_tree` | Dump the UIA tree so the agent can choose stable identifiers. |
| `find_element` | Locate one control by `automation_id`, `name`, `control_type`, or `class_name`. |
| `click_element` | Click a semantically identified control and verify the post-condition. |
| `type_text` | Type into a target control and optionally verify the value. |
| `hotkey` | Send a pywinauto-style key chord such as `^s` or `%{F4}`. |
| `screenshot` | Capture the screen, a window, or a region. |
| `wait_until_element` | Wait for a control to exist, become visible, or become enabled. |
| `verify_text_exists` | Verify text through UIA first, OCR only when requested. |
| `fallback_click_by_image_or_ocr` | Last-resort click by image template or OCR anchor. |
| `generate_stable_script_from_trace` | Convert the current trace into a pywinauto replay script. |

## Install

Python 3.12 or newer is required.

For normal Windows agent use:

```powershell
py -3.12 -m venv .venv
.\.venv\Scripts\python -m pip install --upgrade pip
.\.venv\Scripts\python -m pip install "windows-gui-mcp[windows,ocr]"
```

For local development from this repository:

```bash
python -m venv .venv
./.venv/bin/python -m pip install --upgrade pip
./.venv/bin/python -m pip install -e ".[dev]"
```

On Windows, install the optional runtime extras when you want live GUI control:

```powershell
.\.venv\Scripts\python -m pip install -e ".[dev,windows,ocr]"
```

OCR support is optional. If you use Tesseract OCR, install the Windows package
separately and make sure `tesseract.exe` is on `PATH`.

## Run

Start the MCP server on the Windows machine that owns the desktop session:

```powershell
windows-gui-mcp
```

Check CLI metadata without starting the MCP stdio transport:

```powershell
windows-gui-mcp --help
windows-gui-mcp --version
```

Example local MCP client config:

```json
{
  "mcpServers": {
    "windows-gui": {
      "command": "windows-gui-mcp"
    }
  }
}
```

Example SSH-based config from another machine:

```json
{
  "mcpServers": {
    "windows-gui": {
      "command": "ssh",
      "args": [
        "user@windows-host",
        "C:\\path\\to\\windows-gui-mcp\\.venv\\Scripts\\windows-gui-mcp.exe"
      ]
    }
  }
}
```

## Example workflow

This is the intended agent loop for a Notepad or Calculator task:

```text
1. list_windows()
2. focus_window(title_regex="Notepad|Calculator")
3. dump_ui_tree(window_handle=...)
4. find_element(spec={"name": "Save", "control_type": "Button"})
5. click_element(
     spec={"name": "Save", "control_type": "Button"},
     expect_element_after={"class_name": "#32770"}
   )
6. type_text(
     spec={"automation_id": "1001"},
     text="agent-notes.txt",
     verify_value_contains="agent-notes.txt"
   )
7. hotkey("%{ENTER}")
8. generate_stable_script_from_trace()
```

See [examples/notepad_calculator.md](examples/notepad_calculator.md) for a
longer walkthrough.

## Safety rules

- Prefer `automation_id`, then `name`, then `control_type`, then `class_name`.
- Do not start with screen coordinates.
- Verify every click or text entry with a concrete post-condition.
- Re-dump the UI tree after a failed verification instead of retrying blindly.
- Treat OCR and image matching as fallbacks, not the primary automation path.

## Development checks

```bash
python -m compileall -q src tests
python -m pytest -q
ruff check .
python -m build
twine check dist/*
```

## Contributing and security

See [CONTRIBUTING.md](CONTRIBUTING.md) for development workflow and automation
design rules. See [SECURITY.md](SECURITY.md) for vulnerability reporting and
desktop automation safety expectations.

## License

MIT
