Metadata-Version: 2.4
Name: harness-browser
Version: 0.1.3
Summary: AI-friendly browser automation via CDP with profile-based login persistence
License: MIT
Keywords: ai,automation,browser,cdp,devtools
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Requires-Python: >=3.11
Requires-Dist: aiohttp>=3.9
Requires-Dist: mcp>=1.0
Requires-Dist: pydantic>=2.0
Requires-Dist: websockets>=12.0
Provides-Extra: dev
Requires-Dist: coverage[toml]>=7.5; extra == 'dev'
Requires-Dist: mypy>=1.9; extra == 'dev'
Requires-Dist: pre-commit>=3.7; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: ruff>=0.4; extra == 'dev'
Description-Content-Type: text/markdown

<p align="center">
  <img src="assets/images/banner.jpeg" alt="Harness Browser Banner" width="100%" />
</p>

<h1 align="center">Harness Browser</h1>

<p align="center">
  <strong>AI-friendly browser automation via Chrome DevTools Protocol (CDP).</strong>
</p>

<p align="center">
  <a href="https://pypi.org/project/harness-browser/"><img src="https://img.shields.io/pypi/v/harness-browser" alt="PyPI" /></a>
  <a href="https://github.com/orcakit/harness-browser/actions/workflows/ci.yml"><img src="https://github.com/orcakit/harness-browser/actions/workflows/ci.yml/badge.svg" alt="CI" /></a>
  <a href="https://www.python.org/"><img src="https://img.shields.io/badge/python-3.11+-blue.svg" alt="Python 3.11+" /></a>
  <a href="LICENSE"><img src="https://img.shields.io/badge/License-MIT-yellow.svg" alt="License: MIT" /></a>
</p>

<p align="center">
  <b>English</b> · <a href="README_CN.md">中文</a>
</p>

---

## Highlights

- **Pure CDP, Zero Playwright** — Direct WebSocket connection to Chrome DevTools, no browser binaries shipped, no driver layer
- **Token-Efficient DOM** — 4-level output from ~50 to ~3000 tokens; `interactive` mode returns only clickable/typeable elements with stable refs
- **Login Persistence** — One Chrome user-data-dir per profile; log in once, every subsequent run reuses cookies and storage
- **Agent-First API** — Stateless `browser_tool()` function, MCP server, and Claude Code skill ready to drop into any agent project
- **Full Observability** — Per-action metrics (`duration_ms`, `estimated_tokens`, `screenshot_size_kb`) plus lifecycle hooks

---

## Overview

Harness Browser is an agent-first browser runtime built on pure CDP. It provides predictable DOM snapshots, persistent profile sessions, and a typed Python API designed for LLM tool-calling. No Playwright, no Selenium — just a direct WebSocket connection to Chrome.

The library solves three problems for AI agents:
1. **Token cost** — Multi-level DOM output keeps context windows lean
2. **Element stability** — Ref-based targeting survives layout reflows
3. **Authentication** — Profile persistence eliminates repeated logins

---

## Core Technology

| Component | Technology | Purpose |
|-----------|-----------|---------|
| Transport | WebSocket (`websockets>=12.0`) | Direct CDP communication with Chrome |
| DOM Engine | Custom tree walker | Multi-level DOM serialization with ref assignment |
| Configuration | Pydantic models | Typed settings with env-var override |
| MCP Server | `mcp>=1.0` | Expose browser actions as MCP tools |
| Profiles | Chrome user-data-dir | Persistent login state per named profile |

---

## Features

- **Pure CDP** — Direct WebSocket connection, no Playwright dependency
- **Profile-based login persistence** — Chrome user-data-dir per profile, cookies/sessions reused across runs
- **4-level DOM output** — `minimal` (~50 tokens), `interactive` (~200–500 tokens), `full` (~1000–3000 tokens), `structured` (JSON)
- **Ref system** — Stable element references across actions, invalidated on navigation
- **Hook system** — `before_action`, `after_action`, `action_error`, `page_navigated`
- **Per-action metrics** — `duration_ms`, `estimated_tokens`, `screenshot_size_kb`
- **Environment-variable configuration** — All paths, ports, and timeouts configurable without code changes
- **Remote/Docker Chrome support** — Bypass launcher via `BROWSER_USE_CDP_WS_URL`
- **MCP Server** — Expose actions as MCP tools for Claude Code and other MCP clients
- **Drop-in Claude Code skill** — Copy `skills/harness-browser/` into any agent project
- **Strict typing** — mypy strict, ruff clean, comprehensive test coverage

---

## Quick Start

### Requirements

- Python 3.11+
- Chrome or Chromium

```bash
# Ubuntu/Debian
sudo apt install chromium-browser

# macOS
brew install --cask google-chrome
```

### Installation

```bash
pip install harness-browser

# Optional: download a Playwright-managed Chromium into the standard
# cache (~/.cache/ms-playwright/...) when you don't have a system Chrome.
# This installs `playwright` itself if it isn't already present, then
# fetches the browser binary. Set HARNESS_SKIP_PLAYWRIGHT_PIP=1 to skip
# the pip step (e.g. in pre-baked images).
harness-browser install-browser
```

### Python API

```python
import asyncio
from harness_browser import BrowserSession

async def main():
    async with await BrowserSession.create(profile="default") as sess:
        await sess.navigate("https://example.com")
        result = await sess.dom_tree(level="interactive")
        print(result.content)
        # → [ref=inp_1] input[text] placeholder="Search"
        # → [ref=btn_2] button "Go"
        await sess.click(ref="btn_2")

asyncio.run(main())
```

### Stateless Tool Interface (for AI Frameworks)

```python
from harness_browser import browser_tool

# All calls route to the same session by profile name
result = await browser_tool(action="navigate", url="https://github.com", profile="work")
result = await browser_tool(action="dom_tree", level="interactive", profile="work")
result = await browser_tool(action="click", ref="btn_search", profile="work")
result = await browser_tool(action="type", text="harness", profile="work")
```

### CLI

Every action is also a shell command. Sequential calls on the same
`--profile` attach to the same Chrome process, so refs and login state
carry across invocations:

```bash
# Drive the browser
harness-browser navigate "https://example.com" --profile work
harness-browser dom-tree --profile work
# → [ref=inp_1] input[text] placeholder="Search"
# → [ref=btn_2] button "Go"
harness-browser click --ref inp_1 --profile work
harness-browser type "harness" --profile work
harness-browser click --ref btn_2 --profile work
harness-browser screenshot --path /tmp/result.png --profile work

# Tear down (Chrome itself stays running for later attach)
harness-browser close-session --profile work
```

Add `--json` to any command for the full structured `ToolResult` (useful in
scripts), and `--auto` / `--headed` / `--headless` to override the launch
mode on the first call per profile. Run `harness-browser --help` for the
full list of subcommands.

### MCP Server

```bash
python -m harness_browser.mcp_server
```

Add to Claude Code `settings.json`:

```json
{
  "mcpServers": {
    "harness-browser": {
      "command": "python",
      "args": ["-m", "harness_browser.mcp_server"],
      "env": {
        "BROWSER_USE_MODE": "auto",
        "BROWSER_USE_PROFILES_DIR": "/data/browser-profiles"
      }
    }
  }
}
```

Available MCP tools: `browser_navigate`, `browser_dom_tree`, `browser_screenshot`, `browser_click`, `browser_type`, `browser_eval_js`, `install_browser`.

---

## DOM Levels

| Level | Tokens | Use Case |
|-------|--------|----------|
| `minimal` | ~50 | Confirm page loaded, check title/URL |
| `interactive` | ~200–500 | Find clickable/typeable elements (default) |
| `full` | ~1000–3000 | Read page content |
| `structured` | varies | JSON for programmatic processing |

---

## Login State Reuse

Profiles persist Chrome sessions in `~/.harness-browser/profiles/<name>/`:

```python
# First run: navigate to login page, log in manually
await browser_tool(action="navigate", url="https://github.com/login", profile="github")

# All future runs: login state reused automatically
await browser_tool(action="navigate", url="https://github.com/settings", profile="github")
```

---

## Hook System

```python
async with await BrowserSession.create(profile="work") as sess:
    @sess.on("before_action")
    async def log_action(event):
        print(f"[{event['action']}] starting")

    @sess.on("after_action")
    async def log_metrics(metrics):
        print(f"  done in {metrics.duration_ms}ms (~{metrics.estimated_tokens} tokens)")

    await sess.navigate("https://example.com")
```

---

## Screenshots

`screenshot` writes a PNG to disk and returns its path — never raw base64. That keeps token usage flat regardless of image size.

```python
# Default: timestamped file in BROWSER_USE_SCREENSHOTS_DIR
result = await sess.screenshot()
# → /home/user/.harness-browser/screenshots/harness-1779462725763.png

# Full scrollable page
await sess.screenshot(full_page=True)

# Crop to a single element
await sess.screenshot(element_ref="btn_2")

# Pin the file path
await sess.screenshot(path="/tmp/latest.png")
```

---

## Configuration

All settings can be configured via environment variables — no code changes required.

| Environment Variable | Default | Description |
|---------------------|---------|-------------|
| `BROWSER_USE_PROFILES_DIR` | `~/.harness-browser/profiles` | Root directory for Chrome user-data-dirs |
| `BROWSER_USE_SCREENSHOTS_DIR` | `~/.harness-browser/screenshots` | Directory for PNG screenshots |
| `BROWSER_USE_CDP_HOST` | `localhost` | Host serving Chrome's CDP endpoint |
| `BROWSER_USE_CDP_PORT_START` | `9222` | First CDP debug port |
| `BROWSER_USE_MODE` | `auto` | Launch mode: `auto` / `headed` / `headless` |
| `BROWSER_USE_CHROME_BIN` | auto-detect | Absolute path to Chrome/Chromium |
| `BROWSER_USE_CDP_TIMEOUT` | `30.0` | CDP command timeout (seconds) |
| `BROWSER_USE_CDP_WS_URL` | — | Direct WebSocket URL (bypasses launcher) |

---

## Actions Reference

| Action | Required | Optional |
|--------|----------|----------|
| `navigate` | `url` | |
| `dom_tree` | | `level` (default: `interactive`) |
| `screenshot` | | `element_ref`, `full_page`, `path` |
| `click` | one of: `ref`, `selector`, `x`+`y` | |
| `type` | `text` | `ref` |
| `scroll` | | `direction`, `amount` |
| `hover` | `ref` | |
| `eval_js` | `expression` | |
| `go_back` | | |
| `go_forward` | | |
| `reload` | | |
| `list_tabs` | | |
| `new_tab` | | `url` |
| `switch_tab` | `tab_id` | |
| `close_tab` | | `tab_id` |
| `close_session` | | |

---

## Development

```bash
# Clone
git clone https://github.com/orcakit/harness-browser.git
cd harness-browser

# Install with dev extras
uv sync --extra dev

# Install pre-commit hooks
pre-commit install

# Run tests
make test

# Lint + type check
make lint

# Format
make format

# Build wheel
make build
```

---

## Related Projects

| Project | Description |
|---------|-------------|
| [harness-agent](https://github.com/orcakit/harness-agent) | Production-grade AI agent platform built on LangChain Deep Agents |
| [harness-memory](https://github.com/orcakit/harness-memory) | Pluggable memory system with hierarchical recall and FTS |
| [harness-im-bridge](https://github.com/orcakit/harness-im-bridge) | Multi-platform IM channel bridge for AI agents |

---

## Contributing

See [CONTRIBUTING.md](CONTRIBUTING.md).

## License

[MIT](LICENSE)
