Metadata-Version: 2.4
Name: browser-agent-bridge
Version: 0.2.4
Summary: Browser Bridge server and CLI for controlling a Chrome extension over WebSocket
License-Expression: MIT
Keywords: browser,bridge,automation,websocket,fastapi,cli
Classifier: Development Status :: 3 - Alpha
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Framework :: FastAPI
Classifier: Topic :: Internet :: WWW/HTTP :: WSGI :: Server
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: fastapi>=0.116.1
Requires-Dist: PyJWT>=2.10.1
Requires-Dist: uvicorn[standard]>=0.35.0
Requires-Dist: websockets>=13.1
Provides-Extra: dev
Requires-Dist: build>=1.2.2; extra == "dev"
Requires-Dist: httpx>=0.28.1; extra == "dev"
Requires-Dist: pytest>=8.4.2; extra == "dev"
Requires-Dist: twine>=5.1.1; extra == "dev"
Dynamic: license-file

# Browser-Agent Bridge - Ultra-Fast Browser Control for Agents

WebSocket-only HTML-first browser bridge for remotely controlling a local Chrome extension, built as a super fast alternative to traditional vision-based browser control systems.

## Why This Exists

Traditional browser relays often rely on LLM vision to understand web pages at each step. In practice, that approach is:

1. Expensive: it consumes many tokens to repeatedly analyze visual page state.
2. Slow: repeated visual analysis adds latency at every interaction step.
3. Error-prone: visual perception includes noise that is less relevant than structured HTML for deterministic control.

This project exists as an HTML-first relay: the browser-side extension exposes structured observations and preprocessed HTML, so remote agents can interact with websites with lower cost, lower latency, and more reliable control.

## Architecture (WS-only)

```text
Operator CLI (remote/local)
    |
    |  ws(s)://.../ws/operator   (auth)
    v
Bridge Server
    ^
    |  ws(s)://.../ws/client     (auth)
    |
Chrome Extension (local browser)
    |
    +-- content script commands: observe/click/type/get_html/ping_tab/etc.
```

The extension connects outbound to server. Operator sends commands through server to a specific `(instance_id, client_id)`.

## Protocol

### Client -> Server
- `auth`: `{kind, instance_id, client_id, token}`
- `result`: `{kind, command_id, ok, result|error}`
- `ping`

### Server -> Client
- `auth_ok` / `auth_error`
- `command`: `{kind, command_id, type, payload, request_id, sent_at}`
- `pong`

### Operator -> Server
- `auth`: `{kind, token}`
- `list_clients`
- `connect_status`: `{kind, instance_id, client_id}`
- `send_command`: `{kind, instance_id, client_id, type, payload, timeout_s, request_id}`
- `ping`

### Server -> Operator
- `auth_ok` / `auth_error`
- `clients`
- `connect_status`
- `command_result`
- `pong`

## Auth Modes

Set `BRIDGE_AUTH_MODE`:

- `static` (default): compare token against `BRIDGE_SHARED_TOKEN` (for clients) and `BRIDGE_OPERATOR_TOKEN` (for operator; defaults to shared token).
  - `BRIDGE_OPERATOR_TOKEN` must be at least 16 chars and include lowercase, uppercase, digit, and symbol.
- `jwt`: validate JWT with `BRIDGE_JWT_SECRET`/`BRIDGE_JWT_ALG`.
  - Client JWT should include matching `instance_id` and `client_id` claims.
  - Operator JWT should include `role=operator`.

### Production safety

- `BRIDGE_ENV=production` enforces strong auth config:
  - static mode: `BRIDGE_SHARED_TOKEN` must not be empty/dev default.
  - jwt mode: `BRIDGE_JWT_SECRET` must not be default.

## Install (pipx recommended)

```bash
python3 -m pip install --user pipx
python3 -m pipx ensurepath
pipx install browser-agent-bridge
```

## Quick Start

### 1) (Optional) Generate local JWT secret file

```bash
browser-bridge setup-secret
```

If `BRIDGE_AUTH_MODE=jwt` and `BRIDGE_JWT_SECRET` is still default, server startup auto-loads/creates local secret file (`~/.browser_bridge/jwt_secret` or `BRIDGE_JWT_SECRET_FILE`).

### 2) Start server

```bash
# static mode example
export BRIDGE_AUTH_MODE=static
export BRIDGE_SHARED_TOKEN='change-me-strong-token'
export BRIDGE_OPERATOR_TOKEN='Str0ng!Operator#42'
browser-bridge-server
```

### 3) Load extension

1. Open `chrome://extensions`
2. Enable Developer mode
3. Load unpacked `extension/`
4. In popup fill:
   - `Bridge Server WS URL`: `ws://127.0.0.1:8765/ws/client` (or `wss://.../ws/client`)
   - `Instance ID`: e.g. `local-instance`
   - `Client ID`: e.g. `chrome-main`
   - `Auth Token / JWT`: client token
5. Save + Connect

Connected tab preview:

![Connected tab preview](docs/images/tab-preview.png)

### 4) Operator CLI usage

```bash
browser-bridge --server-ws-url ws://127.0.0.1:8765/ws/operator --token 'Str0ng!Operator#42' list-clients
browser-bridge --server-ws-url ws://127.0.0.1:8765/ws/operator --token 'Str0ng!Operator#42' connect-status --instance-id local-instance --client-id chrome-main
browser-bridge --server-ws-url ws://127.0.0.1:8765/ws/operator --token 'Str0ng!Operator#42' ping-tab --instance-id local-instance --client-id chrome-main
browser-bridge --server-ws-url ws://127.0.0.1:8765/ws/operator --token 'Str0ng!Operator#42' observe --instance-id local-instance --client-id chrome-main
```

`observe` now returns stable references per node:
- `ref`: stable element reference for follow-up actions
- `click_ref`: reference biased toward a clickable ancestor (row/link/button)
- `clickable_selector`: selector for the chosen clickable ancestor

You can pass these back to `click` via `send-command` payload using `ref`/`click_ref` and optional guardrails:
- `prefer`: `control` (default), `row`, or `link`
- `avoid_roles`: e.g. `["checkbox", "menuitem"]`
- `avoid_tags`: e.g. `["input"]`
- `avoid_input_types`: e.g. `["checkbox", "radio"]`

Raw command:

```bash
browser-bridge --server-ws-url ws://127.0.0.1:8765/ws/operator --token '...' \
  send-command --instance-id local-instance --client-id chrome-main \
  --type get_html --payload '{"max_chars":40000}'
```

You can also avoid shell JSON escaping with `--payload-file`:

```bash
cat > /tmp/cmd.json <<'JSON'
{"selector":"input[name=\"q\"]","text":"openclaw"}
JSON

browser-bridge --server-ws-url ws://127.0.0.1:8765/ws/operator --token '...' \
  send-command --instance-id local-instance --client-id chrome-main \
  --type type --payload-file /tmp/cmd.json
```

`get_html` result includes:
- `html`: captured DOM text (possibly truncated)
- `truncated`: whether output was cut to `payload.max_chars`
- `notes`: actionable recommendations (for example, increase `max_chars` when truncated, or set `preprocess=false` for rawer DOM)
- `preprocess` and `removed_nodes`: preprocessing mode and removed-node count

Adaptive load wait (`navigate`, `click`, `type`):

- Extension now waits for tab load completion before replying, but only up to 10s (adaptive: returns immediately if tab is already `complete`).
- Override per command payload:
  - `wait_for_load` (default `true`)
  - `wait_for_load_ms` (default `10000`, capped at `10000`)
- Command result includes `load_wait` diagnostics: `waited_ms`, `completed`, `timed_out`, `final_status`, `enabled`, `max_wait_ms`.

Example:

```bash
browser-bridge --server-ws-url ws://127.0.0.1:8765/ws/operator --token '...' \
  send-command --instance-id local-instance --client-id chrome-main \
  --type navigate --payload '{"url":"https://example.com","wait_for_load_ms":4000}'
```

Human-like typing (`type`):

- `type` now simulates typing character-by-character by default to better match human input behavior.
- Optional payload fields:
  - `human_like` (default `true`)
  - `clear_first` (default `true`)
  - `keystroke_delay_ms` (default `45`)
  - `keystroke_jitter_ms` (default `30`)

Example:

```bash
browser-bridge --server-ws-url ws://127.0.0.1:8765/ws/operator --token '...' \
  send-command --instance-id local-instance --client-id chrome-main \
  --type type --payload '{"selector":"input[name=\"q\"]","text":"hello world","keystroke_delay_ms":70,"keystroke_jitter_ms":45}'
```

## Security Hardening

- Use TLS in non-local deployments (`wss://`).
- Use strong static tokens or JWT secret. Operator static token must include mixed-case letters, digits, symbols, and be 16+ chars.
- Optional command allowlist: `BRIDGE_COMMAND_ALLOWLIST=observe,ping_tab,get_html`.
- Optional allowed clients allowlist in static mode: `BRIDGE_ALLOWED_CLIENTS=instance1:client1,instance2:client2`.
- Request idempotency/replay guard is enforced by `request_id` dedup window.
- Max payload limit is enforced by `BRIDGE_MAX_MESSAGE_BYTES`.

## Testing

```bash
pytest -v
```

Coverage includes WS auth success/failure, command routing, disconnect handling, wrong target routing, CLI failure paths, and reconnect replacement behavior.

## Contributing

Contributions are very welcome.

If you want to help, great places to start are:
- bug fixes and reliability improvements
- new command handlers and protocol hardening
- better docs and examples
- tests for real-world edge cases

Quick contributor workflow:
1. Fork the repo and create a focused branch.
2. Run tests locally (`pytest -v`).
3. Open a PR with a clear description, motivation, and test notes.

For detailed guidelines, see [CONTRIBUTING.md](/Users/grigorijpotemkin/pets/browser_agent_bridge/CONTRIBUTING.md).

If you have ideas but no patch yet, opening an issue/discussion is also appreciated.

## License

MIT (see `LICENSE`).

---

Created by the creator of [openclaw-setup.me](https://openclaw-setup.me/).
