Metadata-Version: 2.4
Name: harness-browser
Version: 0.1.2
Summary: AI-friendly browser automation via CDP with profile-based login persistence
License: MIT
Keywords: ai,automation,browser,cdp,devtools
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.11
Requires-Dist: aiohttp>=3.9
Requires-Dist: mcp>=1.0
Requires-Dist: pydantic>=2.0
Requires-Dist: websockets>=12.0
Provides-Extra: dev
Requires-Dist: coverage[toml]>=7.5; extra == 'dev'
Requires-Dist: mypy>=1.9; extra == 'dev'
Requires-Dist: pre-commit>=3.7; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Description-Content-Type: text/markdown

# harness-browser

AI-friendly browser automation via Chrome DevTools Protocol (CDP).

[![PyPI](https://img.shields.io/pypi/v/harness-browser)](https://pypi.org/project/harness-browser/)
[![CI](https://github.com/harness/harness-browser/actions/workflows/ci.yml/badge.svg)](https://github.com/harness/harness-browser/actions)
[![Python 3.11+](https://img.shields.io/badge/python-3.11+-blue.svg)](https://www.python.org/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](LICENSE)

**English** · [中文](README.zh.md)

> An agent-first browser runtime built on pure CDP. Predictable DOM snapshots,
> persistent profile sessions, and a typed Python API designed for LLM
> tool-calling — no Playwright, no driver layer in between.

## Why harness-browser

| Concern | How we address it |
|---------|-------------------|
| **Token cost** | 4-level DOM with `interactive` mode (~200–500 tokens) returns only clickable/typeable elements with stable refs — not raw HTML |
| **Stable element targeting** | Refs (`btn_2`, `inp_search`) survive layout reflows and are auto-invalidated on navigation, so the agent never points at a stale node |
| **Login persistence** | One Chrome user-data-dir per profile under `~/.harness-browser/profiles/<name>/` — log in once, every subsequent run reuses cookies and storage |
| **No Playwright tax** | Pure CDP over WebSocket (`websockets>=12.0`); no browser binaries shipped, no patched Chromium, no driver layer |
| **Observability** | Every action emits `ActionMetrics` (`duration_ms`, `dom_nodes_scanned`, `estimated_tokens`, `screenshot_size_kb`) plus `before_action` / `after_action` / `action_error` / `page_navigated` hooks |
| **Configuration** | Seven `BROWSER_USE_*` env vars cover paths, ports, timeouts, and remote/Docker Chrome via `BROWSER_USE_CDP_WS_URL` — no code changes between dev, CI, and prod |
| **Agent integrations** | Stateless `browser_tool(action=..., profile=...)` for any framework, MCP server (`python -m harness_browser.mcp_server`), and a ready-to-copy Claude Code skill in `skills/` |

## Features

- **Pure CDP** — direct WebSocket connection, no Playwright dependency
- **Profile-based login persistence** — Chrome user-data-dir per profile, cookies/sessions reused across runs
- **4-level DOM output** — `minimal` (~50 tokens), `interactive` (~200–500 tokens), `full` (~1000–3000 tokens), `structured` (JSON)
- **Ref system** — stable element references across actions, invalidated on navigation
- **Hook system** — `before_action`, `after_action`, `action_error`, `page_navigated`
- **Per-action metrics** — `duration_ms`, `estimated_tokens`, `screenshot_size_kb`
- **Environment-variable configuration** — all paths, ports, and timeouts configurable without code changes
- **Remote/Docker Chrome support** — bypass launcher via `BROWSER_USE_CDP_WS_URL`
- **MCP Server** — expose actions as MCP tools for Claude Code and other MCP clients
- **Drop-in Claude Code skill** — copy `skills/harness-browser/` into any agent project
- **Strict typing** — mypy strict, ruff clean, 34 unit tests covering DOM, refs, hooks, settings, and CDP framing

## Requirements

- Python 3.11+
- Chrome or Chromium

```bash
# Ubuntu/Debian
sudo apt install chromium-browser

# macOS
brew install --cask google-chrome
```

## Installation

```bash
pip install harness-browser
```

## Quick Start

### Python API

```python
import asyncio
from harness_browser import BrowserSession

async def main():
    async with await BrowserSession.create(profile="default") as sess:
        await sess.navigate("https://example.com")
        result = await sess.dom_tree(level="interactive")
        print(result.content)
        # → [ref=inp_1] input[text] placeholder="Search"
        # → [ref=btn_2] button "Go"
        await sess.click(ref="btn_2")

asyncio.run(main())
```

### AI Framework Usage (stateless)

```python
from harness_browser import browser_tool

# All calls route to the same session by profile name
result = await browser_tool(action="navigate", url="https://github.com", profile="work")
result = await browser_tool(action="dom_tree", level="interactive", profile="work")
result = await browser_tool(action="click", ref="btn_search", profile="work")
result = await browser_tool(action="type", text="harness", profile="work")
```

## DOM Levels

| Level | Tokens | Use case |
|-------|--------|----------|
| `minimal` | ~50 | Confirm page loaded, check title/URL |
| `interactive` | ~200–500 | Find clickable/typeable elements (default) |
| `full` | ~1000–3000 | Read page content |
| `structured` | varies | JSON for programmatic processing |

## Login State Reuse

Profiles persist Chrome sessions in `~/.harness-browser/profiles/<name>/`:

```python
# First run: navigate to login page, log in manually
await browser_tool(action="navigate", url="https://github.com/login", profile="github")

# All future runs: login state reused automatically
await browser_tool(action="navigate", url="https://github.com/settings", profile="github")
```

## Hook System

```python
async with await BrowserSession.create(profile="work") as sess:
    @sess.on("before_action")
    async def log_action(event):
        print(f"[{event['action']}] starting")

    @sess.on("after_action")
    async def log_metrics(metrics):
        print(f"  done in {metrics.duration_ms}ms (~{metrics.estimated_tokens} tokens)")

    await sess.navigate("https://example.com")
```

## MCP Server

```bash
python -m harness_browser.mcp_server
```

Add to Claude Code `settings.json`:

```json
{
  "mcpServers": {
    "harness-browser": {
      "command": "python",
      "args": ["-m", "harness_browser.mcp_server"],
      "env": {
        "BROWSER_USE_MODE": "auto",
        "BROWSER_USE_PROFILES_DIR": "/data/browser-profiles"
      }
    }
  }
}
```

All `BROWSER_USE_*` environment variables can be passed through the MCP `env` block —
this is the recommended way to configure mode, profile location, and remote CDP
endpoints for an MCP-hosted browser.

Available MCP tools: `browser_navigate`, `browser_dom_tree`, `browser_screenshot`,
`browser_click`, `browser_type`, `browser_eval_js`.

## Screenshots

`screenshot` writes a PNG to disk and returns its path — never raw base64.
That keeps token usage flat regardless of image size and lets dashboards
preview the file directly.

```python
# default: timestamped file in BROWSER_USE_SCREENSHOTS_DIR
result = await sess.screenshot()
print(result.content)
# → /home/user/.harness-browser/screenshots/harness-1779462725763.png

# full scrollable page (uses Page.getLayoutMetrics + captureBeyondViewport)
await sess.screenshot(full_page=True)

# crop to a single element discovered via dom_tree
await sess.screenshot(element_ref="btn_2")

# pin the file path — every call overwrites the same file
await sess.screenshot(path="/tmp/latest.png")
```

`result.metadata` carries the page `url` / `title` / `width` / `height` /
`size_kb` / `full_page` so callers can render context without an extra
`Runtime.evaluate`.

## Claude Code Skill

A ready-to-use skill ships under `skills/`:

```bash
# Copy into another agent project as a Claude Code skill
cp -r skills/harness-browser /path/to/other-project/.codebuddy/skills/
# or the Chinese variant
cp -r skills/harness-browser-zh /path/to/other-project/.codebuddy/skills/
```

The skill teaches the agent the standard navigate → `dom_tree` → click/type loop
and the ref discipline (always re-fetch DOM after navigation).

## Actions Reference

| Action | Required | Optional |
|--------|----------|----------|
| `navigate` | `url` | |
| `dom_tree` | | `level` (default: `interactive`) |
| `screenshot` | | `element_ref`, `full_page`, `path` |
| `click` | one of: `ref`, `selector`, `x`+`y` | |
| `type` | `text` | `ref` |
| `scroll` | | `direction`, `amount` |
| `hover` | `ref` | |
| `eval_js` | `expression` | |
| `go_back` | | |
| `go_forward` | | |
| `reload` | | |
| `list_tabs` | | |
| `new_tab` | | `url` |
| `switch_tab` | `tab_id` | |
| `close_tab` | | `tab_id` |
| `close_session` | | |

## Configuration

All settings can be configured via environment variables. No code changes required.

| Environment Variable | Default | Description |
|---------------------|---------|-------------|
| `BROWSER_USE_PROFILES_DIR` | `~/.harness-browser/profiles` | Root directory for Chrome user-data-dirs |
| `BROWSER_USE_SCREENSHOTS_DIR` | `~/.harness-browser/screenshots` | Directory where the `screenshot` action writes PNG files |
| `BROWSER_USE_CDP_HOST` | `localhost` | Host or IP serving Chrome's CDP HTTP/WebSocket endpoint |
| `BROWSER_USE_CDP_PORT_START` | `9222` | First CDP debug port assigned to profiles |
| `BROWSER_USE_MODE` | `auto` | Launch mode: `auto` / `headed` / `headless`. `auto` picks headed when `DISPLAY`/`WAYLAND_DISPLAY` is set (or on macOS/Windows), else headless |
| `BROWSER_USE_CHROME_BIN` | auto-detect | Absolute path to Chrome/Chromium executable |
| `BROWSER_USE_CDP_TIMEOUT` | `30.0` | Seconds to wait for a CDP command response |
| `BROWSER_USE_LAUNCH_RETRIES` | `20` | Times to poll Chrome after launch |
| `BROWSER_USE_LAUNCH_DELAY` | `0.25` | Seconds between launch poll attempts |
| `BROWSER_USE_CDP_WS_URL` | — | **Direct connect**: bypass launcher, connect to this WebSocket URL |

### Common scenarios

**Custom profile storage:**

```bash
export BROWSER_USE_PROFILES_DIR=/data/browser-profiles
```

**Force headed or headless mode** (default is `auto`, which picks based on `DISPLAY`):

```bash
export BROWSER_USE_MODE=headless   # always headless (CI, containers)
export BROWSER_USE_MODE=headed     # always headed (force a window even without DISPLAY)
# unset / "auto" → headed when a desktop is detected, headless otherwise
```

**Non-standard Chrome path:**

```bash
export BROWSER_USE_CHROME_BIN=/opt/google/chrome/chrome
```

**Connect to a remote or Docker Chrome** (bypasses launcher entirely):

```bash
# Start Chrome with --remote-debugging-port=9222 --remote-debugging-address=0.0.0.0
export BROWSER_USE_CDP_WS_URL="ws://remote-host:9222/devtools/browser/xxxxxxxx"
```

**Talk to Chrome on another host or container** (keeps the attach/launcher logic, just changes the host):

```bash
# Chrome already running with --remote-debugging-port=9222 --remote-debugging-address=0.0.0.0
export BROWSER_USE_CDP_HOST=10.0.0.42
# harness will hit http://10.0.0.42:9222/json/version and use that page's WS URL
```

**Override settings in code** (useful for testing or multi-instance setups):

```python
from harness_browser import BrowserSession, HarnessSettings

cfg = HarnessSettings(
    cdp_port_start=9300,
    cdp_timeout=60.0,
    profiles_dir="/data/profiles",
)
sess = await BrowserSession.create(profile="work", settings=cfg)
```

## Development

```bash
# Clone
git clone https://git.woa.com/orcakit/browser-use.git
cd browser-use

# Install with dev extras
uv sync --extra dev

# Install pre-commit hooks
pre-commit install

# Run tests
make test

# Lint + type check
make lint

# Format
make format

# Build wheel
make build
```

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md).

## License

MIT
