Metadata-Version: 2.4
Name: mcp-vm-blackbox
Version: 0.3.1
Summary: MCP server for controlling VirtualBox VMs — screenshots, keyboard input, PowerShell, vagrant, WinRM, podman, and CI build pipelines
Project-URL: Homepage, https://github.com/bitflight-devops/vm-flightsimulator
Project-URL: Issues, https://github.com/bitflight-devops/vm-flightsimulator/issues
Project-URL: Repository, https://github.com/bitflight-devops/vm-flightsimulator
Author-email: Jamie Nelson <jamie@bitflight.io>
Maintainer-email: Jamie Nelson <jamie@bitflight.io>
License: MIT
License-File: LICENSE
Keywords: mcp,mcp-server,model-context-protocol,vagrant,virtualbox,winrm
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: POSIX :: Linux
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: <3.15,>=3.11
Requires-Dist: asyncssh>=2.22.0
Requires-Dist: fastmcp[tasks]>=3.1.0
Requires-Dist: gitpython>=3.1.46
Requires-Dist: libtmux>=0.53.1
Requires-Dist: podman>=5.7.0
Requires-Dist: pydantic>=2.12.5
Requires-Dist: python-gitlab>=4.0.0
Requires-Dist: python-vagrant>=1.0
Requires-Dist: pywinrm>=0.5.0
Requires-Dist: typer>=0.24.1
Description-Content-Type: text/markdown

# vm-flightsimulator

[![PyPI version](https://badge.fury.io/py/mcp-vm-blackbox.svg)](https://badge.fury.io/py/mcp-vm-blackbox)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/downloads/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Tests](https://github.com/bitflight-devops/vm-flightsimulator/actions/workflows/test.yml/badge.svg)](https://github.com/bitflight-devops/vm-flightsimulator/actions/workflows/test.yml)

A Claude Code plugin that gives AI agents a complete control surface for VM automation: screenshot-driven GUI interaction, structured background task orchestration, non-intrusive progress observation, and frame-accurate session recording. Backend-agnostic by design — VirtualBox is the current implementation.

---

## What It Does

You tell the agent "install this software on the VM" and the plugin handles everything: starting Vagrant, taking screenshots to verify state, typing into windows, running PowerShell over WinRM, polling scheduled tasks to completion, tailing logs, and returning a structured result you can act on.

The plugin enforces a clear separation of concerns:

```
Skills  →  define approved loops and tooling the orchestrator follows
Agents  →  take actions (vm-pilot) or observe state (vm-pilot-inspector)
MCP     →  executes tool calls against the actual VM
```

Nothing is a black box. Every action goes through the loop. Every result comes back structured.

---

## Quick Start

### 1. Install the MCP server

```bash
# Run on demand
uvx mcp-vm-blackbox

# Or add persistently to Claude Code
claude mcp add vm-blackbox -- uvx mcp-vm-blackbox
```

### 2. Install the plugin

**Option A — Marketplace (Claude Code only):**

```bash
claude plugin marketplace add bitflight-devops/vm-flightsimulator
```

Then open `/plugins` in Claude Code and install `vm-flightsimulator`.

**Option B — `vm-blackbox-installer` (Claude, OpenCode, Gemini CLI, Codex):**

```bash
# Install for all platforms, globally (~/.claude, ~/.gemini, etc.)
uvx --from mcp-vm-blackbox vm-blackbox-installer --all --global

# Or pick specific platforms
uvx --from mcp-vm-blackbox vm-blackbox-installer --claude --gemini --global

# Or install locally to the current project directory
uvx --from mcp-vm-blackbox vm-blackbox-installer --all --local
```

The installer copies `skills/` and `agents/` to each platform's plugin directory and registers the `mcp-vm-blackbox` MCP server in the platform's config file. See [Installer Reference](#installer-reference) for full details.

### 3. Start working

```text
"Take a screenshot of my-vm and describe what's on screen"
"Run the installer on my-vm and tell me when it's done"
"Record the boot sequence of my-vm for the next 2 minutes"
```

The plugin automatically selects the right skill and agent for each task.

---

## Prerequisites

| Requirement                | Version                    |
| -------------------------- | -------------------------- |
| Python                     | 3.11+                      |
| uv                         | latest                     |
| VirtualBox                 | 7.1+                       |
| Vagrant                    | 2.3+                       |
| Packer (for VM builds)     | 1.10+                      |
| tmux (for detached builds) | any                        |
| WinRM on guest             | configured for Windows VMs |

---

## Architecture

The plugin uses a three-layer architecture. Each layer has a single job.

```
┌─────────────────────────────────────────────────────────┐
│                      Skills                              │
│  vm-vision-control  vm-ground-control  vm-radio-control  │
│  vm-blackbox-record                                      │
│         (define approved loops and tooling)              │
└────────────────────┬────────────────────────────────────┘
                     │ dispatches
┌────────────────────▼────────────────────────────────────┐
│                      Agents                              │
│     vm-pilot               vm-pilot-inspector            │
│  (acts on the VM)         (observes VM state)            │
└────────────────────┬────────────────────────────────────┘
                     │ calls
┌────────────────────▼────────────────────────────────────┐
│                   MCP Server                             │
│   vm_screenshot  vm_powershell  vm_type  vm_key          │
│   vm_mouse_click  vagrant_*  ci_*  podman_*  build_*     │
│              (executes against real infrastructure)      │
└─────────────────────────────────────────────────────────┘
```

**Skills** are instruction logic — they define the approved loop and tooling the orchestrator follows. They do not take actions directly.

**Agents** take actions. `vm-pilot` drives the VM. `vm-pilot-inspector` reads state without touching anything.

**The MCP server** (`mcp-vm-blackbox` on PyPI) executes tool calls. It connects to VirtualBox via vboxapi (local XPCOM) and VBoxManage (local or SSH), to Windows guests via WinRM, and to remote CI hosts via SSH tunnels.

---

## Skills

### vm-vision-control — GUI Interaction Loop

The mandatory entry point for any task that touches a VM's desktop. Clicking, typing, reading the screen — all of it goes through this skill first.

The loop is strict:

```
1. Screenshot     →  vm_screenshot
2. Read image     →  Read tool on the saved_to path
3. Decide         →  Analyse screen, determine next action
4. Act            →  vm_mouse_click / vm_type / vm_key / vm_powershell
5. Repeat         →  Return to step 1
```

Never act without a fresh screenshot. Never skip the read step.

**Natural language triggers:** "click on the VM", "type into the VM", "what's on the screen", "navigate the installer", "take a screenshot"

**Timing to observe between steps:**

| Operation                  | Wait before next screenshot |
| -------------------------- | --------------------------- |
| Click a button             | 0.5 – 1 s                   |
| Open an application        | 3 – 5 s                     |
| Launch an installer        | 10 – 15 s                   |
| Installer panel transition | 2 – 3 s                     |
| Installer completion       | 30 – 60 s                   |
| VM boot                    | 60 – 120 s                  |

Full reference: [docs/skills/vm-vision-control.md](docs/skills/vm-vision-control.md)

---

### vm-ground-control — Orchestrator Coordination

Use for any VM operation that will take more than ~30 seconds. Dispatches `vm-pilot` as a background Task and gives you a structured return block to parse.

```python
agent_id = Task(
    description="Run the installer",
    subagent_type="vm-pilot",
    prompt="""
GOAL:
Run the silent installer via scheduled task and report whether it succeeded.

STEPS:
1. Invoke vm-vision-control skill.
2. Test-NetConnection <HOST> -Port <PORT> -InformationLevel Quiet
3. Register-ScheduledTask ...
4. Poll every 30 s until State = Ready or 15 min elapsed
5. Read the install log
6. Take a screenshot; describe what is on screen.

EVIDENCE TO COLLECT:
- Log file full contents
- Artefact path existence (yes/no)
- Final VM screen description

RETURN FORMAT:
STATUS: SUCCESS | FAILED | PARTIAL | TIMEOUT
SUMMARY: <2-4 sentences>
FILES_READ:
  install.log: <contents or "not found">
SCREEN_STATE: <description>
ISSUES: <or "none">
NEXT_STEP: <recommended action>
""",
    run_in_background=True,
)
```

**Store the agent ID.** You need it for progress checks and resumption.

The pilot owns the VM for the duration. The orchestrator does not call `vm_screenshot` or `vm_powershell` while a pilot task is running.

**Routing on STATUS:**

| STATUS    | Meaning                            | Action                                   |
| --------- | ---------------------------------- | ---------------------------------------- |
| `SUCCESS` | Task completed, artefact confirmed | Proceed                                  |
| `FAILED`  | Task failed with known cause       | Check `ISSUES`, fix and re-dispatch      |
| `PARTIAL` | Evidence incomplete                | Resume pilot to collect missing evidence |
| `TIMEOUT` | Poll limit reached                 | Check `SCREEN_STATE` + `FILES_READ`      |

**Built-in templates:** installer via scheduled task, task poll, config file read, network connectivity check.

Full reference: [docs/skills/vm-ground-control.md](docs/skills/vm-ground-control.md)

---

### vm-radio-control — Progress Observer

Check what a running pilot is doing without interrupting it. Dispatches `vm-pilot-inspector` as a foreground Task that reads the pilot's transcript and queries VM state independently.

```python
Task(
    description="Check installer progress",
    subagent_type="vm-pilot-inspector",
    prompt="""
output_type: progress
pilot_agent_id: <agent-id-from-ground-control>
vm_name: <vm-name>
project_path: /absolute/path/to/project
""",
    run_in_background=False,
)
```

**Output types:**

| `output_type` | Collects                             | Use when                           |
| ------------- | ------------------------------------ | ---------------------------------- |
| `quick`       | STATUS + SCREEN_STATE only           | Fast pulse check, context is tight |
| `progress`    | Full 6-step report **(default)**     | Normal progress check              |
| `screenshot`  | Full report + UI element coordinates | Need to verify exact screen state  |
| `transcript`  | Full report + last 10 pilot turns    | Pilot appears stuck                |

**Structured report fields:** `STATUS`, `TASK_STATE`, `LOG_TAIL`, `PATH_EXISTS`, `SCREEN_STATE`, `PILOT_PROGRESS`, `ELAPSED_ESTIMATE`, `ISSUES`

Route on `STATUS` only — not on `SCREEN_STATE` or `PILOT_PROGRESS`.

Full reference: [docs/skills/vm-radio-control.md](docs/skills/vm-radio-control.md)

---

### vm-blackbox-record — Session Recording

Record VM screen sessions as WebM/VP8 video and extract frames at specific timestamps. Recording runs entirely on the host via VBoxManage — no guest changes required.

```bash
# Start recording
uv run skills/vm-blackbox-record/scripts/vm_capture.py record start "my-vm"
# → Recording: scratch/recordings/my-vm-20260305-143022-screen0.webm

# Run your operation (VM is live while recording)

# Stop recording
uv run skills/vm-blackbox-record/scripts/vm_capture.py record stop "my-vm"

# Extract frames for inspection
uv run skills/vm-blackbox-record/scripts/extract_frames.py \
  scratch/recordings/my-vm-20260305-143022-screen0.webm \
  --interval 30 \
  --outdir /tmp/frames
```

Frame extraction uses PyAV directly — no `ffmpeg` binary required.

**Recording parameters are locked once recording starts** (VirtualBox 7.1 constraint). Configure resolution, bitrate, and frame rate before enabling. `vm_capture.py` handles the correct sequence automatically.

**Backend table:**

| Backend      | Status    | Notes                                         |
| ------------ | --------- | --------------------------------------------- |
| `vboxmanage` | **Ready** | VirtualBox 7.1+. Default.                     |
| `ffmpeg`     | Planned   | v4l2/X11 capture                              |
| `mcp`        | Planned   | MCP screenshot sequences assembled into video |
| `winrm`      | Planned   | PowerShell-based capture over WinRM           |

Full reference: [docs/skills/vm-blackbox-record.md](docs/skills/vm-blackbox-record.md)

---

## Agents

### vm-pilot

Hands-and-eyes agent. Takes screenshots, runs PowerShell via WinRM, sends keystrokes, and returns structured results. Dispatched by `vm-ground-control`.

**Exactly five tools:**

| Tool            | Does                                             |
| --------------- | ------------------------------------------------ |
| `vm_screenshot` | Capture screen, return `base64_png` + `saved_to` |
| `vm_powershell` | Run PowerShell; return stdout/stderr/exit_code   |
| `vm_type`       | Type text (256-char limit per call)              |
| `vm_key`        | Send enter/tab/escape/space/backspace            |
| `vm_info`       | Return VM hardware metadata and state            |

The pilot acts and observes — it never analyses, plans, or recommends. When it cannot proceed, it populates `ISSUES` in the return block and returns `STATUS: PARTIAL` or `STATUS: FAILED`. The orchestrator decides next steps.

Full reference: [docs/agents/vm-pilot.md](docs/agents/vm-pilot.md)

---

### vm-pilot-inspector

Observer agent. Reads the pilot's transcript, queries VM state via WinRM, takes a screenshot or extracts a recording frame, and returns one structured report. Dispatched by `vm-radio-control`.

**Six-step workflow:**

1. Read pilot transcript (`~/.claude/projects/<encoded_path>/<agent-id>.jsonl`)
2. Check process or scheduled task state via WinRM
3. Tail the 3 most recent log files (last 10 lines each)
4. Check sentinel path existence
5. Take a live screenshot or extract a recording frame
6. Compose and return the structured report

**Never takes control actions.** No `vm_type`, no `vm_key`, no process invocation.

Full reference: [docs/agents/vm-pilot-inspector.md](docs/agents/vm-pilot-inspector.md)

---

## MCP Tools

The `vm-blackbox` MCP server exposes 23 tools across six domains.

### VM Inspection (4 tools)

| Tool                | Description                                       |
| ------------------- | ------------------------------------------------- |
| `vm_list`           | List all VMs with running state                   |
| `vm_info`           | Return memory, CPU, VRAM, Guest Additions, state  |
| `vm_screenshot`     | Capture screen; image embedded inline in response |
| `vm_screenshot_api` | Capture via vboxapi (no subprocess, local only)   |

### VM Interaction (4 tools)

| Tool             | Description                                            |
| ---------------- | ------------------------------------------------------ |
| `vm_powershell`  | Run PowerShell via WinRM; SSH tunnel for remotes       |
| `vm_type`        | Type text; 256-char limit; handles chunking            |
| `vm_key`         | Send enter/tab/escape/space/backspace                  |
| `vm_mouse_click` | Click at absolute coordinates via vboxapi (local only) |

### Vagrant (5 tools)

| Tool                | Description                                     |
| ------------------- | ----------------------------------------------- |
| `vagrant_status`    | Show Vagrantfile VM states                      |
| `vagrant_up`        | Start a VM                                      |
| `vagrant_provision` | Run provisioners                                |
| `vagrant_destroy`   | Destroy a VM (optionally in tmux)               |
| `vagrant_winrm`     | Run a command via WinRM on a Windows Vagrant VM |

### Build Orchestration (3 tools)

| Tool           | Description                                        |
| -------------- | -------------------------------------------------- |
| `build_start`  | Start a background build (Packer, tmux session)    |
| `build_watch`  | Tail build log until pattern matches or timeout    |
| `build_status` | Check if build is running; return last N log lines |

### CI Tools (4 tools)

| Tool                 | Description                                    |
| -------------------- | ---------------------------------------------- |
| `ci_check`           | SSH connectivity + host stats                  |
| `ci_run`             | Run a shell command on the CI host             |
| `ci_pipeline_status` | Get GitLab pipeline status for a project       |
| `ci_preflight`       | Verify required tools are installed on CI host |

### Podman / Containers (4 tools)

| Tool                    | Description                                       |
| ----------------------- | ------------------------------------------------- |
| `podman_ps`             | List containers                                   |
| `podman_exec`           | Run a command inside a container                  |
| `podman_logs`           | Fetch last N log lines                            |
| `podman_restart`        | Restart a container                               |
| `podman_service_status` | Check systemd service status for a Podman service |

### Target Parameter

All tools accept a `target` parameter:

| Value              | Connects to                       |
| ------------------ | --------------------------------- |
| `"local"`          | Local host (default)              |
| `"ci"`             | Named target from server config   |
| `"user@host"`      | Raw SSH string                    |
| `"user@host:port"` | Raw SSH string with explicit port |

Full signatures: [docs/mcp-tools.md](docs/mcp-tools.md)

---

## Skill-to-Task Decision Guide

| You want to...                                    | Use                                                  |
| ------------------------------------------------- | ---------------------------------------------------- |
| Click a button / type text / read the screen      | `vm-vision-control`                                  |
| Run a multi-step operation (>30 seconds)          | `vm-ground-control` → dispatches `vm-pilot`          |
| Check on a running background task                | `vm-radio-control` → dispatches `vm-pilot-inspector` |
| Continue a completed pilot with more work         | `vm-ground-control` with `resume=agent_id`           |
| Record an operation as video                      | `vm-blackbox-record`                                 |
| Extract frames from a recording                   | `vm-blackbox-record` extract_frames.py               |
| Get the current screen without interrupting pilot | `vm-radio-control` with `output_type: screenshot`    |

---

## Hard Constraints

These are not suggestions — they are required by the plugin architecture:

1. **`vm-vision-control` is mandatory before any GUI interaction.** Do not call `vm_screenshot`, `vm_mouse_click`, `vm_type`, or `vm_key` directly from the orchestrator.

2. **MCP is the only approved path.** Do not use raw Bash for vagrant, VBoxManage, podman, or WinRM — always go through `mcp__vm-blackbox__*` tools.

3. **The orchestrator does not call VM tools while a pilot is running.** The pilot owns the VM. Interrupt only by resuming the agent.

4. **Skills are scoped.** Each skill has a single responsibility. Do not combine vision-control and recording in one invocation.

---

## Conventions

- `vm_type` has a hard 256-character limit per call. Chunk long text across multiple calls.
- Password fields may double-type. Clear with Ctrl+A → backspace before typing.
- `vm_mouse_click` and `vm_screenshot_api` require `target="local"` (vboxapi uses local XPCOM only).
- Recording parameters lock when recording is enabled — configure before starting, not after.
- The pilot's transcript lives at `~/.claude/projects/<encoded_project_path>/<agent-id>.jsonl`.

---

## Installer Reference

`vm-blackbox-installer` installs skills, agents, and MCP server registration across AI coding tools in one command.

### Supported Platforms

| Flag         | Tool        | Config directory                        |
| ------------ | ----------- | --------------------------------------- |
| `--claude`   | Claude Code | `~/.claude/` or `./.claude/`            |
| `--opencode` | OpenCode    | `~/.config/opencode/` or `./.opencode/` |
| `--gemini`   | Gemini CLI  | `~/.gemini/` or `./.gemini/`            |
| `--codex`    | Codex       | `~/.codex/` or `./.codex/`              |

### Usage

```bash
# Install for all platforms globally
uvx --from mcp-vm-blackbox vm-blackbox-installer --all --global

# Install for specific platforms
uvx --from mcp-vm-blackbox vm-blackbox-installer --claude --gemini --global

# Install locally (current directory instead of home)
uvx --from mcp-vm-blackbox vm-blackbox-installer --claude --local
```

`--global` and `--local` are mutually exclusive. `--global` is the default when neither is specified.

### What the Installer Does

For each selected platform:

1. Copies `skills/` → `<config>/plugins/vm-flightsimulator/skills/`
2. Copies `agents/` → `<config>/plugins/vm-flightsimulator/agents/`
3. Registers `{ "command": "uvx", "args": ["mcp-vm-blackbox"] }` in the platform's MCP config file

**MCP config files written:**

| Platform | Config file                                         |
| -------- | --------------------------------------------------- |
| Claude   | `~/.claude.json` (global) or `.claude.json` (local) |
| OpenCode | `~/.config/opencode/opencode.json`                  |
| Gemini   | `~/.gemini/settings.json`                           |
| Codex    | `~/.codex/config.toml`                              |

**Gemini frontmatter transformation:** The installer converts agent `.md` frontmatter for Gemini's schema — removes the `color:` field and converts comma-separated `tools:` and `skills:` strings to YAML list format.

---

## Local Development

```bash
# Install dependencies
uv sync

# Run tests
uv run pytest

# Run a single test file
uv run pytest packages/mcp_vm_blackbox/tests/test_vm_interaction.py

# Format
uv run ruff format

# Lint
uv run ruff check

# Type check
uv run ty check packages/

# Test the plugin locally in Claude Code
claude --plugin-dir ./
```

Coverage threshold: 60%. Modules requiring live VMs (WinRM, SSH tunnel, VBoxManage backends) are excluded from CI coverage.

---

## Installation Reference

```bash
# Marketplace
claude plugin marketplace add bitflight-devops/vm-flightsimulator

# MCP server only (PyPI)
uvx mcp-vm-blackbox

# Persistent MCP registration
claude mcp add vm-blackbox -- uvx mcp-vm-blackbox
```

---

## License

MIT — see [LICENSE](LICENSE)
