Metadata-Version: 2.4
Name: agent_aid
Version: 1.4.1
Summary: Local Windows desktop control for AI agents — Python library and CLI.
Author: agent_aid
License: MIT
Project-URL: Documentation, https://github.com/local/agent_aid
Keywords: windows,automation,ai-agent,rpa,computer-use
Classifier: Development Status :: 4 - Beta
Classifier: Environment :: Console
Classifier: Operating System :: Microsoft :: Windows
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.10
Description-Content-Type: text/markdown
Requires-Dist: mss>=10.1.0

# agent-aid

A small Python library and CLI to control the local Windows desktop from AI
agents. Open Interpreter, GPT, Claude, or your own agent — call `agent-aid` to
read the screen, drive mouse and keyboard, and manage windows. Single
dependency: `mss`.

```
pip install agent-aid
```

With `uv` (recommended — installs Python automatically if needed):

```
uv tool install agent-aid --python 3.11
```

## Quick look

```
agent-aid health
agent-aid state
agent-aid screenshot active_window=true save_path=captures/active.png include_base64=false
agent-aid click x=500 y=300
agent-aid type text="hello"
agent-aid press keys=ctrl+s
agent-aid focus_window title_fragment=Chrome
agent-aid open target=https://example.com
```

List every route:

```
agent-aid --list
```

Print the full AI-oriented usage spec to stdout (markdown):

```
agent-aid --readme
```

## Capabilities

- Screenshots: full desktop, single monitor, active window, specific `hwnd`, or rectangular region
- PNG `sha256` hash returned for every screenshot (use with `wait_screen_change`)
- Mouse: click, double-click, right-click, drag, move, scroll (vertical/horizontal), hold buttons
- Keyboard: short text, hotkeys (`ctrl+shift+a`), modifier hold/release
- Clipboard: `clipboard text=...` + `press keys=ctrl+v` for long pastes
- Windows: find, focus, minimize/maximize/restore/close, move/resize, hide/show
- System: list processes, open file/URL/`shell:` target, read pixel color, query state
- Verify: `wait_screen_change`, `wait_pixel`, `wait_window`
- `batch` for atomic multi-step flows in a single CLI call

## Targeting windows / coordinates

All coordinates are physical screen pixels. To work relative to a specific
window:

```
agent-aid click x=120 y=80 relative_to=active_window
agent-aid click x=120 y=80 hwnd=123456
```

## Use as a Python library

```python
from agent_aid import core

core.set_dpi_aware()
print(core.active_window())
core.click(800, 500)
core.type_text("hello")
```

Same capabilities as the CLI — just call the `core` module directly from your
Python code.

## Argument formats

```
# key=value (shortest)
agent-aid click x=500 y=300 button=left

# JSON (for nested fields)
agent-aid screenshot '{"region":{"left":0,"top":0,"width":800,"height":600},"save_path":"r.png"}'

# Pretty-print output
agent-aid --pretty state
```

## Practical AI agent flow

1. `agent-aid state` — see what you're looking at
2. `agent-aid screenshot active_window=true save_path=captures/now.png include_base64=false`
3. Inspect the image, choose target coordinates
4. Act: `click` / `type` / `press` / `clipboard`
5. Verify: another `screenshot` or `wait_screen_change`

## Safety notes

- Sends real mouse and keyboard input — types into the focused window.
- `clipboard` overwrites the user's clipboard.
- `open` launches a Windows target (same effect as a user double-click).
- `window/manage close` posts `WM_CLOSE` — apps with unsaved data may prompt.
- After any action, prefer to verify with `wait_*` or a fresh `screenshot`.

## License

MIT.
