# WebTap Browser Debugging Guide

WebTap is a Chrome DevTools Protocol (CDP) client for browser debugging via REPL/MCP with native event storage, browser-level multiplexing, and daemon-based architecture.

## Quick Start

```python
# Chrome must be running with --remote-debugging-port=9222
targets()                                    # List all Chrome targets
watch(["9222:abc123"])                       # Watch target(s) by ID
network()                                    # View network requests (filtered)
network(url="*api*")                         # Filter by URL pattern
request(123, "9222:abc123", ["response.content"])  # Get response body
console()                                    # View console messages
unwatch()                                    # Unwatch all targets
```

## Architecture

WebTap uses a daemon with browser-level WebSocket multiplexing:
- **Daemon** - Background process manages CDP connections + DuckDB storage per target
- **BrowserSession** - Single WebSocket per Chrome port, multiplexes sessions via `flatten: True`
- **Watch model** - Declarative watch/unwatch; targets auto-reattach on navigation, reload, SW restart
- **REPL/MCP** - Clients communicate via JSON-RPC 2.0 (single `/rpc` endpoint)

```
REPL/MCP Client → RPCClient.call() → POST /rpc → RPCFramework → Services → DuckDB
                                                      ↓                        ↓
                                              ConnectionManager       BrowserSession → Chrome
```

## Core Commands

### Target Discovery & Watching

```python
targets()                        # List all Chrome targets (pages, SWs, workers)
watch(["9222:abc123"])           # Watch targets — auto-attaches and enables CDP
watch(["9222:abc123", "9222:def456"])  # Watch multiple
watching()                       # Show currently watched targets with state
unwatch(["9222:abc123"])         # Unwatch specific target
unwatch()                        # Unwatch all

navigate("https://...", "9222:abc123")  # Navigate target to URL
reload("9222:abc123")                   # Reload target
back("9222:abc123") / forward("9222:abc123")  # Navigate history
```

### Network Monitoring

```python
network()                              # Filtered requests (default)
network(show_all=True)                 # Bypass filters, show everything
network(status=404)                    # Filter by HTTP status
network(method="POST")                 # Filter by method
network(resource_type="xhr")           # Filter by resource type
network(url="*api*")                   # Filter by URL pattern
network(status=200, url="*graphql*")   # Combine filters
network(limit=50)                      # Show more results
network(target="9222:abc123")          # Filter to specific target
```

### Request Inspection

```python
# Get HAR request details by row ID and target from network() output
request(123, "9222:abc123")                           # Minimal view (method, url, status)
request(123, "9222:abc123", ["*"])                    # Full HAR entry
request(123, "9222:abc123", ["request.headers.*"])    # Request headers only
request(123, "9222:abc123", ["response.content"])     # Fetch response body
request(123, "9222:abc123", ["request.postData"])     # Request body (POST/PUT)
request(123, "9222:abc123", ["request.postData", "response.content"])  # Both bodies

# With Python expression evaluation (libraries pre-imported)
request(123, "9222:abc123", ["response.content"], expr="json.loads(data['response']['content']['text'])")
request(123, "9222:abc123", ["response.content"], expr="BeautifulSoup(data['response']['content']['text'], 'html.parser').title")

# WebSocket entries: response.content automatically returns frames
request(123, "9222:abc123", ["response.content"])     # Returns frames for WS entries
```

### Code Generation

```python
# Generate Pydantic models from responses
to_model(123, "models/user.py", "User", "9222:abc123")
to_model(123, "models/user.py", "User", "9222:abc123", json_path="data[0]")
to_model(123, "models/form.py", "Form", "9222:abc123", field="request.postData")

# Generate TypeScript/Go/Rust/etc types
quicktype(123, "types/user.ts", "User", "9222:abc123")
quicktype(123, "api.go", "ApiResponse", "9222:abc123")
quicktype(123, "schema.json", "Schema", "9222:abc123")
```

### Console & JavaScript

```python
console()                           # View console messages
console(level="error")              # Filter by level
console(limit=100)                  # Show more messages

# Console entry details (requires target)
entry(5, "9222:abc123")             # Minimal view
entry(5, "9222:abc123", ["*"])      # Full CDP event
entry(5, "9222:abc123", ["stackTrace"])  # Stack trace only

# JavaScript execution (fresh scope by default)
js("document.title", "9222:abc123")
js("[...document.links].map(a => a.href)", "9222:abc123")
js("fetch('/api').then(r=>r.json())", "9222:abc123", await_promise=True)

# Persistent scope for multi-step operations
js("var data = null", "9222:abc123", persist=True)
js("fetch('/api').then(r => r.json()).then(d => data = d)", "9222:abc123", persist=True, await_promise=True)
js("data.users.length", "9222:abc123", persist=True)

# With browser-selected element
js("element.offsetWidth", "9222:abc123", selection=1)
```

### Request Interception

```python
fetch()                                    # Show capture state + rules
fetch({"capture": False})                  # Disable body capture globally
fetch({"capture": True})                   # Re-enable capture
fetch({"mock": {"*api*": '{"ok":1}'}, "target": "9222:abc123"})  # Mock for target
fetch({"block": ["*tracking*"], "target": "9222:abc123"})        # Block for target
fetch({"target": "9222:abc123"})           # Clear rules for target
```

### Filter Management

```python
filters()                                           # Show all filter groups
filters(add="myfilter", hide={"urls": ["*ads*"]})  # Create filter
filters(add="apionly", hide={"types": ["Image", "Font", "Stylesheet"]})
filters(enable="myfilter")                          # Enable group
filters(disable="myfilter")                         # Disable group
filters(remove="myfilter")                          # Delete group
```

### Browser Element Selection

```python
# Use Chrome extension to select elements, then:
selections()                                    # View all selections
selections(expr="data['selections']['1']")     # Get element #1 data
selections(expr="data['selections']['1']['selector']")  # CSS selector
selections(expr="data['selections']['1']['outerHTML']") # Element HTML

# Use with JavaScript
js("element.offsetWidth", "9222:abc123", selection=1)
```

### Clearing Data

```python
clear()                             # Clear events (default)
clear(console=True)                 # Clear browser console
clear(events=True, console=True)    # Clear everything
```

## Common Workflows

### Analyze API Responses

```python
targets()
watch(["9222:abc123"])
network(url="*api*")
# Note the ID and Target columns
request(42, "9222:abc123", ["response.content"])
request(42, "9222:abc123", ["response.content"], expr="json.loads(data['response']['content']['text'])")
to_model(42, "models/response.py", "ApiResponse", "9222:abc123")
```

### Debug Failed Requests

```python
network(status=404)
network(status=500)
request(123, "9222:abc123", ["*"])
request(123, "9222:abc123", ["response.content"])
```

### Parse HTML Responses

```python
network(resource_type="document")
request(123, "9222:abc123", ["response.content"], expr="""
BeautifulSoup(data['response']['content']['text'], 'html.parser').find_all('a', href=True)
""")
```

### Extract Data with Expressions

```python
# Libraries available: json, re, bs4/BeautifulSoup, lxml, jwt, yaml, httpx, etc.

# Parse JSON (base64 auto-decoded)
request(123, "9222:abc123", ["response.content"], expr="json.loads(data['response']['content']['text'])['users']")

# Decode JWT
request(123, "9222:abc123", ["response.content"], expr="jwt.decode(data['response']['content']['text'], options={'verify_signature': False})")

# Parse form data
request(123, "9222:abc123", ["request.postData"], expr="dict(urllib.parse.parse_qsl(data['request']['postData']['text']))")
```

## Daemon Management

```bash
webtap                    # Start REPL (auto-starts daemon)
webtap daemon start       # Start daemon explicitly
webtap daemon status      # Show daemon status
webtap daemon stop        # Stop daemon
webtap status             # Show daemon and connection status
```

MCP mode is auto-detected when stdin is piped (no flags needed).

## Tips

1. **Chrome must run with debugging port**: `google-chrome --remote-debugging-port=9222`
2. **Row IDs + target from network()** - Use these with `request()`, `to_model()`, `quicktype()`
3. **Field selection patterns** - `["*"]` for all, `["request.*"]` for request only, `["response.content"]` for body
4. **Base64 auto-decoded** - Response bodies are automatically decoded to UTF-8; binary content stays base64
5. **Filters reduce noise** - Default filters remove ads, tracking, analytics
6. **Expression evaluation** - Pre-imported: json, re, bs4, lxml, jwt, yaml, httpx, urllib, datetime
7. **Fresh JS scope** - Default prevents redeclaration errors; use `persist=True` for multi-step
8. **Auto-watch children** - Popups/new tabs opened by watched targets are automatically watched
9. **MCP integration** - All commands work as MCP tools for Claude/LLMs
