Metadata-Version: 2.4
Name: openadapt-capture
Version: 0.3.0
Summary: GUI interaction capture - platform-agnostic event streams with time-aligned media
Project-URL: Homepage, https://github.com/OpenAdaptAI/openadapt-capture
Project-URL: Repository, https://github.com/OpenAdaptAI/openadapt-capture
Author-email: "MLDSAI Inc." <richard@openadapt.ai>
License-Expression: MIT
Keywords: automation,capture,events,gui,rpa
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Requires-Dist: av>=10.0.0
Requires-Dist: fire>=0.7.1
Requires-Dist: mss>=6.0.0
Requires-Dist: openai>=2.11.0
Requires-Dist: pillow>=9.0.0
Requires-Dist: pydantic-settings>=2.12.0
Requires-Dist: pydantic>=2.0.0
Requires-Dist: pynput>=1.7.0
Requires-Dist: sounddevice>=0.5.3
Requires-Dist: soundfile>=0.13.1
Requires-Dist: websockets>=12.0
Provides-Extra: all
Requires-Dist: faster-whisper>=1.1.0; extra == 'all'
Requires-Dist: magic-wormhole>=0.17.0; extra == 'all'
Requires-Dist: openadapt-privacy>=0.1.0; extra == 'all'
Requires-Dist: openai-whisper>=20230314; extra == 'all'
Provides-Extra: dev
Requires-Dist: matplotlib>=3.5.0; extra == 'dev'
Requires-Dist: numpy>=1.21.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23.0; extra == 'dev'
Requires-Dist: pytest-cov>=4.0.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: privacy
Requires-Dist: openadapt-privacy>=0.1.0; extra == 'privacy'
Provides-Extra: share
Requires-Dist: magic-wormhole>=0.17.0; extra == 'share'
Provides-Extra: transcribe
Requires-Dist: openai-whisper>=20230314; extra == 'transcribe'
Provides-Extra: transcribe-fast
Requires-Dist: faster-whisper>=1.1.0; extra == 'transcribe-fast'
Description-Content-Type: text/markdown

# OpenAdapt Capture

[![Build Status](https://github.com/OpenAdaptAI/openadapt-capture/actions/workflows/test.yml/badge.svg)](https://github.com/OpenAdaptAI/openadapt-capture/actions/workflows/test.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)
[![Python 3.10+](https://img.shields.io/badge/python-3.10%2B-blue)](https://www.python.org/downloads/)

<!-- PyPI badges (uncomment once package is published)
[![PyPI version](https://img.shields.io/pypi/v/openadapt-capture.svg)](https://pypi.org/project/openadapt-capture/)
[![Downloads](https://img.shields.io/pypi/dm/openadapt-capture.svg)](https://pypi.org/project/openadapt-capture/)
-->

**OpenAdapt Capture** is the data collection component of the [OpenAdapt](https://github.com/OpenAdaptAI) GUI automation ecosystem.

Capture platform-agnostic GUI interaction streams with time-aligned screenshots and audio for training ML models or replaying workflows.

> **Status:** Pre-alpha. See [docs/DESIGN.md](docs/DESIGN.md) for architecture discussion.

---

## The OpenAdapt Ecosystem

```
                          OpenAdapt GUI Automation Pipeline
                          =================================

    +-----------------+          +------------------+          +------------------+
    |                 |          |                  |          |                  |
    | openadapt-      |  ------> | openadapt-ml     |  ------> |    Deploy        |
    | capture         |  Convert | (Train & Eval)   |  Export  |    (Inference)   |
    |                 |          |                  |          |                  |
    +-----------------+          +------------------+          +------------------+
          |                             |                             |
          v                             v                             v
    - Record GUI                  - Fine-tune VLMs              - Run trained
      interactions                - Evaluate on                   agent on new
    - Mouse, keyboard,              benchmarks (WAA)              tasks
      screen, audio               - Compare models              - Real-time
    - Privacy scrubbing           - Cloud GPU training            automation

```

| Component | Purpose | Repository |
|-----------|---------|------------|
| **openadapt-capture** | Record human demonstrations | [GitHub](https://github.com/OpenAdaptAI/openadapt-capture) |
| **openadapt-ml** | Train and evaluate GUI automation models | [GitHub](https://github.com/OpenAdaptAI/openadapt-ml) |
| **openadapt-privacy** | PII scrubbing for recordings | Coming soon |

---

## Installation

```bash
uv add openadapt-capture
```

This includes everything needed to capture and replay GUI interactions (mouse, keyboard, screen recording).

For audio capture with Whisper transcription (large download):

```bash
uv add "openadapt-capture[audio]"
```

## Quick Start

### Capture

```python
from openadapt_capture import Recorder

# Record GUI interactions
with Recorder("./my_capture", task_description="Demo task") as recorder:
    # Captures mouse, keyboard, and screen until context exits
    input("Press Enter to stop recording...")

print(f"Captured {recorder.event_count} events")
```

### Replay / Analysis

```python
from openadapt_capture import Capture

# Load and iterate over time-aligned events
capture = Capture.load("./my_capture")

for action in capture.actions():
    # Each action has an associated screenshot
    print(f"{action.timestamp}: {action.type} at ({action.x}, {action.y})")
    screenshot = action.screenshot  # PIL Image at time of action
```

### Low-Level API

```python
from openadapt_capture import (
    create_capture, process_events,
    MouseDownEvent, MouseButton,
)

# Create storage (platform and screen size auto-detected)
capture, storage = create_capture("./my_capture")

# Write raw events
storage.write_event(MouseDownEvent(timestamp=1.0, x=100, y=200, button=MouseButton.LEFT))

# Query and process
raw_events = storage.get_events()
actions = process_events(raw_events)  # Merges clicks, drags, typed text
```

## Event Types

**Raw events** (captured):
- `mouse.move`, `mouse.down`, `mouse.up`, `mouse.scroll`
- `key.down`, `key.up`
- `screen.frame`, `audio.chunk`

**Actions** (processed):
- `mouse.singleclick`, `mouse.doubleclick`, `mouse.drag`
- `key.type` (merged keystrokes → text)

## Architecture

```
capture_directory/
├── capture.db      # SQLite: events, metadata
├── video.mp4       # Screen recording
└── audio.flac      # Audio (optional)
```

## Performance Statistics

Track event write latency and analyze capture performance:

```python
from openadapt_capture import Recorder

with Recorder("./my_capture") as recorder:
    input("Press Enter to stop...")

# Access performance statistics
summary = recorder.stats.summary()
print(f"Mean latency: {summary['mean_latency_ms']:.1f}ms")

# Generate performance plot
recorder.stats.plot(output_path="performance.png")
```

![Performance Statistics](docs/images/performance_stats.png)

## Frame Extraction Verification

Compare extracted video frames against original images to verify lossless capture:

```python
from openadapt_capture import compare_video_to_images, plot_comparison

# Compare frames
report = compare_video_to_images(
    "capture/video.mp4",
    [(timestamp, image) for timestamp, image in captured_frames],
)

print(f"Mean diff: {report.mean_diff_overall:.2f}")
print(f"Lossless: {report.is_lossless}")

# Visualize comparison
plot_comparison(report, output_path="comparison.png")
```

![Frame Comparison](docs/images/frame_comparison.png)

## Visualization

Generate animated demos and interactive viewers from recordings:

### Animated GIF Demo

```python
from openadapt_capture import Capture, create_demo

capture = Capture.load("./my_capture")
create_demo(capture, output="demo.gif", fps=10, max_duration=15)
```

### Interactive HTML Viewer

```python
from openadapt_capture import Capture, create_html

capture = Capture.load("./my_capture")
create_html(capture, output="viewer.html", include_audio=True)
```

The HTML viewer includes:
- Timeline scrubber with event markers
- Frame-by-frame navigation
- Synchronized audio playback
- Event list with details panel
- Keyboard shortcuts (Space, arrows, Home/End)

![Capture Viewer](docs/images/viewer_screenshot.png)

### Generate Demo from Command Line

```bash
uv run python scripts/generate_readme_demo.py --duration 10
```

## Optional Extras

| Extra | Features |
|-------|----------|
| `audio` | Audio capture + Whisper transcription |
| `privacy` | PII scrubbing (openadapt-privacy) |
| `all` | Everything |

---

## Training with OpenAdapt-ML

Captured recordings can be used to train vision-language models with [openadapt-ml](https://github.com/OpenAdaptAI/openadapt-ml).

### End-to-End Workflow

```bash
# 1. Capture a workflow demonstration
uv run python -c "
from openadapt_capture import Recorder

with Recorder('./my_capture', task_description='Turn off Night Shift') as recorder:
    input('Perform the task, then press Enter to stop...')
"

# 2. Train a model on the capture (requires openadapt-ml)
uv pip install openadapt-ml
uv run python -m openadapt_ml.cloud.local train \
  --capture ./my_capture \
  --open  # Opens training dashboard

# 3. Compare human vs model predictions
uv run python -m openadapt_ml.scripts.compare \
  --capture ./my_capture \
  --checkpoint checkpoints/model \
  --open
```

### Cloud GPU Training

For faster training with cloud GPUs:

```bash
# Train on Lambda Labs A10 (~$0.75/hr)
uv run python -m openadapt_ml.cloud.lambda_labs train \
  --capture ./my_capture \
  --goal "Turn off Night Shift"
```

See the [openadapt-ml documentation](https://github.com/OpenAdaptAI/openadapt-ml#6-cloud-gpu-training) for cloud setup.

### Data Format

OpenAdapt-ML converts captures to its Episode format automatically:

```python
from openadapt_ml.ingest.capture import capture_to_episode

episode = capture_to_episode("./my_capture")
print(f"Loaded {len(episode.steps)} steps")
print(f"Instruction: {episode.instruction}")
```

The conversion maps capture event types to ML action types:
- `mouse.singleclick` / `mouse.click` -> `CLICK`
- `mouse.doubleclick` -> `DOUBLE_CLICK`
- `mouse.drag` -> `DRAG`
- `mouse.scroll` -> `SCROLL`
- `key.type` -> `TYPE`

---

## Development

```bash
uv sync --dev
uv run pytest
```

## Related Projects

- [openadapt-ml](https://github.com/OpenAdaptAI/openadapt-ml) - Train and evaluate GUI automation models
- [Windows Agent Arena](https://github.com/microsoft/WindowsAgentArena) - Benchmark for Windows GUI agents

## License

MIT
