Metadata-Version: 2.4
Name: mobile-parser
Version: 0.3.0
Summary: Mobile testing MCP server: OmniParser UI element detection + direct device control (mobilecli + WDA)
License: AGPL-3.0-or-later
License-File: LICENSE
Requires-Python: >=3.10
Requires-Dist: easyocr>=1.7.0
Requires-Dist: einops>=0.8.0
Requires-Dist: huggingface-hub>=0.20.0
Requires-Dist: mcp[cli]>=1.3.0
Requires-Dist: numpy<2.0.0
Requires-Dist: opencv-python-headless>=4.8.0
Requires-Dist: pillow>=10.0.0
Requires-Dist: supervision>=0.18.0
Requires-Dist: timm>=1.0.0
Requires-Dist: torch>=2.1.0
Requires-Dist: torchvision>=0.16.0
Requires-Dist: transformers<4.50.0,>=4.45.0
Requires-Dist: ultralytics>=8.0.0
Provides-Extra: test
Requires-Dist: pytest-asyncio>=0.24; extra == 'test'
Requires-Dist: pytest>=8.0; extra == 'test'
Description-Content-Type: text/markdown

# mobile-parser

[![PyPI version](https://img.shields.io/pypi/v/mobile-parser?style=flat-square)](https://pypi.org/project/mobile-parser/)
[![Python](https://img.shields.io/pypi/pyversions/mobile-parser?style=flat-square)](https://pypi.org/project/mobile-parser/)
[![License: AGPL-3.0](https://img.shields.io/badge/license-AGPL--3.0-blue?style=flat-square)](LICENSE)

An MCP server for mobile app testing that combines [OmniParser](https://github.com/microsoft/OmniParser) vision-based UI element detection with direct device control.

Unlike accessibility-tree-based tools, OmniParser detects UI elements directly from screenshots — making it work reliably with **Flutter, WebView, games, and any app** regardless of the UI framework.

## Features

- **Vision-based element detection** — OmniParser (YOLO + Florence-2 + EasyOCR) finds UI elements from screenshots
- **Cross-platform** — iOS Simulator and Android (emulator + real device)
- **Zero-config coordinates** — `find_elements` returns tap-ready coordinates; pass them directly to `tap()`
- **No Appium required** — talks directly to WDA (iOS) and adb (Android)
- **Auto-download everything** — models, tools, and dependencies fetched on first use

## Installation

### Claude Code

```bash
claude mcp add mobile-parser -- uvx mobile-parser
```

<details>
<summary>Claude Desktop / Cursor / Other MCP Clients</summary>

Add to your MCP config JSON:

```json
{
  "mcpServers": {
    "mobile-parser": {
      "command": "uvx",
      "args": ["mobile-parser"]
    }
  }
}
```

</details>

### Prerequisites

- **Python 3.10+** (managed by uv automatically)
- **Node.js / npm** (for mobilecli — auto-downloaded via npx)

<details>
<summary>iOS</summary>

- **Xcode + iOS Simulator**
- **WebDriverAgent** installed on the simulator
  - See: [Setup for iOS Simulator](https://github.com/nicholasyan/mobile-mcp/wiki/Setup-for-iOS-Simulator)

</details>

<details>
<summary>Android</summary>

- **Android SDK** (`adb` in PATH or `ANDROID_HOME` set)
- **Emulator or device** connected via `adb`

</details>

### What gets auto-downloaded

| Component | When | Size |
|-----------|------|------|
| Python packages (torch, etc.) | First `uvx mobile-parser` run | ~2 GB |
| mobilecli binary | First device operation | ~20 MB |
| OmniParser models | First `mobile_find_elements` call | ~1.5 GB |
| Florence-2 processor | First icon captioning | ~500 MB |

## Usage

```
1. mobile_find_elements(device="...") → elements with tap coordinates
2. mobile_tap(device="...", x=tap_x, y=tap_y) → tap the element
```

`mobile_find_elements` handles the full pipeline:

1. Takes a screenshot of the device
2. Runs OmniParser to detect all UI elements (text + icons)
3. Converts pixel coordinates to logical screen coordinates

The returned `tap_x` / `tap_y` can be passed directly to `mobile_tap()`.

### Example prompts

- *"Find and tap the Login button"*
- *"Scroll down and look for a search bar"*
- *"Launch the Settings app and navigate to Wi-Fi"*
- *"Take a screenshot and describe what's on screen"*

## Tools

<details open>
<summary><strong>Screen Analysis (OmniParser)</strong></summary>

| Tool | Description |
|------|-------------|
| `mobile_find_elements` | **Primary tool** — screenshot → OmniParser → tap coordinates |
| `mobile_screenshot` | Take a screenshot (resized for LLM, max 1568px) |
| `mobile_save_screenshot` | Save screenshot to file |
| `mobile_parse_image` | Parse an existing image file |

</details>

<details open>
<summary><strong>Interaction</strong></summary>

| Tool | Description |
|------|-------------|
| `mobile_tap` | Tap at coordinates |
| `mobile_double_tap` | Double-tap at coordinates |
| `mobile_long_press` | Long press at coordinates |
| `mobile_swipe` | Swipe in a direction (up / down / left / right) |
| `mobile_type_text` | Type text into the focused element |
| `mobile_press_button` | Press a hardware button (home / back / etc.) |

</details>

<details open>
<summary><strong>Device Management</strong></summary>

| Tool | Description |
|------|-------------|
| `mobile_list_devices` | List available devices and simulators |
| `mobile_get_screen_size` | Get device screen size |
| `mobile_list_apps` | List installed apps |
| `mobile_launch_app` | Launch an app by bundle ID |
| `mobile_terminate_app` | Terminate a running app |
| `mobile_open_url` | Open a URL in the default browser |

</details>

## Architecture

No dependency on mobile-mcp server. Directly controls devices via platform-native APIs:

| Platform | Device Discovery | Interactions | Screenshots | App Management |
|----------|-----------------|--------------|-------------|----------------|
| **iOS** | mobilecli (npx) | WebDriverAgent HTTP API | WDA `/screenshot` | `xcrun simctl` |
| **Android** | mobilecli (npx) | `adb shell input` | `adb exec-out screencap` | `adb shell am/pm` |

```
mobile-parser (MCP Server)
├── server.py          → FastMCP server with 16 tools
├── coordinator.py     → Screenshot → OmniParser → coordinate conversion
├── mobile_client.py   → Device control (iOS: WDA, Android: adb)
├── mobilecli.py       → mobilecli wrapper (npx auto-download)
├── wda.py             → WebDriverAgent HTTP client
└── parser.py          → OmniParser (YOLO + Florence-2 + EasyOCR)
```

## Configuration

| Environment Variable | Description | Default |
|----------------------|-------------|---------|
| `OMNIPARSER_WEIGHTS_DIR` | Model weights directory | `~/.cache/omniparser` |
| `OMNIPARSER_DEVICE` | Inference device (`cuda` / `mps` / `cpu`) | Auto-detect |
| `MOBILECLI_PATH` | mobilecli binary path | npx auto-download |

## License

[AGPL-3.0](LICENSE) — due to the [ultralytics](https://github.com/ultralytics/ultralytics) (YOLOv8) dependency.
