# WebTap Browser Debugging Guide

WebTap is a Chrome DevTools Protocol (CDP) client for browser debugging via REPL/MCP with native event storage and daemon-based architecture.

## Quick Start

```python
# Connect to Chrome (must be running with --remote-debugging-port=9222)
pages()                              # List available tabs
connect(0)                           # Connect by index
network()                            # View network requests (filtered)
network(url="*api*")                 # Filter by URL pattern
request(123, ["response.content"])   # Get response body by row ID
console()                            # View console messages
disconnect()                         # Disconnect from Chrome
```

## Architecture

WebTap uses a daemon-based architecture:
- **Daemon** - Background process manages CDP WebSocket + DuckDB storage
- **REPL/MCP** - Clients communicate with daemon via HTTP (localhost:8765)
- **HAR Views** - Pre-aggregated network request data for fast queries

```
REPL/MCP Client → HTTP → Daemon → WebSocket → Chrome
                           ↓
                    DuckDB + HAR Views
```

## Core Commands

### Connection & Navigation

```python
pages()                      # List Chrome pages
connect(0)                   # Connect by index
connect(page_id="xyz")       # Connect by page ID
disconnect()                 # Disconnect
status()                     # Show daemon/connection status

navigate("https://...")      # Go to URL
reload(ignore_cache=True)    # Hard refresh
back() / forward()           # Navigate history
page()                       # Current page info
```

### Network Monitoring

```python
network()                              # Filtered requests (default)
network(all=True)                      # Bypass filters, show everything
network(status=404)                    # Filter by HTTP status
network(method="POST")                 # Filter by method
network(type="xhr")                    # Filter by resource type
network(url="*api*")                   # Filter by URL pattern
network(status=200, url="*graphql*")   # Combine filters
network(limit=50)                      # Show more results
```

### Request Inspection

```python
# Get HAR request details by row ID from network() output
request(123)                           # Minimal view (method, url, status)
request(123, ["*"])                    # Full HAR entry
request(123, ["request.headers.*"])    # Request headers only
request(123, ["response.headers.*"])   # Response headers only
request(123, ["response.content"])     # Fetch response body
request(123, ["request.postData"])     # Request body (POST/PUT)
request(123, ["request.postData", "response.content"])  # Both bodies

# With Python expression evaluation (libraries pre-imported)
request(123, ["response.content"], expr="json.loads(data['response']['content']['text'])")
request(123, ["response.content"], expr="BeautifulSoup(data['response']['content']['text'], 'html.parser').title")
request(123, ["response.content"], expr="jwt.decode(data['response']['content']['text'], options={'verify_signature': False})")
```

### Code Generation

```python
# Generate Pydantic models from responses
to_model(123, "models/user.py", "User")
to_model(123, "models/user.py", "User", json_path="data[0]")  # Nested extraction
to_model(123, "models/form.py", "Form", field="request.postData")  # From request body

# Generate TypeScript/Go/Rust/etc types
quicktype(123, "types/user.ts", "User")
quicktype(123, "api.go", "ApiResponse")
quicktype(123, "schema.json", "Schema")
```

### Console & JavaScript

```python
console()                           # View console messages
console(level="error")              # Filter by level
console(limit=100)                  # Show more messages

# JavaScript execution (fresh scope by default)
js("document.title")                           # Get value
js("[...document.links].map(a => a.href)")    # Get all links
js("fetch('/api').then(r=>r.json())", await_promise=True)  # Async
js("document.querySelectorAll('.ad').forEach(e => e.remove())", wait_return=False)  # No return

# Multi-statement code (use persist=True for global scope)
js("var data = null", persist=True)
js("fetch('/api').then(r => r.json()).then(d => data = d)", persist=True, await_promise=True)
js("data.users.length", persist=True)

# With browser-selected element
js("element.offsetWidth", selection=1)  # Use element #1 from selections()
```

### Request Interception

```python
fetch("status")                     # Check interception status
fetch("enable")                     # Enable request interception
fetch("enable", {"response": True}) # Intercept responses too
fetch("disable")                    # Disable interception

requests()                          # Show paused requests
resume(123)                         # Continue request
resume(123, modifications={"url": "https://..."})  # Modify and continue
resume(123, modifications={"method": "POST"})      # Change method
fail(123)                           # Block request
fail(123, reason="AccessDenied")    # Block with specific reason
```

### Filter Management

```python
filters()                                           # Show all filter groups
filters(add="myfilter", hide={"urls": ["*ads*"]})  # Create filter
filters(add="apionly", hide={"types": ["Image", "Font", "Stylesheet"]})
filters(enable="myfilter")                          # Enable group
filters(disable="myfilter")                         # Disable group
filters(remove="myfilter")                          # Delete group

# Built-in groups: ads, tracking, analytics, telemetry, cdn, fonts, images
```

### Browser Element Selection

```python
# Use Chrome extension to select elements, then:
selections()                                    # View all selections
selections(expr="data['selections']['1']")     # Get element #1 data
selections(expr="data['selections']['1']['selector']")  # CSS selector
selections(expr="data['selections']['1']['outerHTML']") # Element HTML

# Use with JavaScript
js("element.offsetWidth", selection=1)          # Run JS on element #1
```

### Clearing Data

```python
clear()                             # Clear events (default)
clear(console=True)                 # Clear browser console
clear(events=True, console=True)    # Clear everything
```

## Common Workflows

### Analyze API Responses

```python
pages()
connect(0)
network(url="*api*")                 # Find API calls
# Note the ID column
request(3264, ["response.content"])  # Get response body
request(3264, ["response.content"], expr="json.loads(data['response']['content']['text'])")
to_model(3264, "models/response.py", "ApiResponse")  # Generate model
```

### Debug Failed Requests

```python
network(status=404)                  # Find 404s
network(status=500)                  # Find 500s
request(123, ["*"])                  # Full details
request(123, ["response.content"])   # Error response body
```

### Intercept and Modify Traffic

```python
fetch("enable")
# Make request in browser - it pauses
requests()                           # See paused requests
request(47, ["request.*"])           # Examine request
resume(47, modifications={"url": "https://api.example.com/v2"})  # Modify URL
# Or block it
fail(47)
fetch("disable")
```

### Parse HTML Responses

```python
network(type="document")
request(123, ["response.content"], expr="""
BeautifulSoup(data['response']['content']['text'], 'html.parser').find_all('a', href=True)
""")
```

### Extract Data with Expressions

```python
# Libraries available: json, re, bs4/BeautifulSoup, lxml, jwt, yaml, httpx, etc.

# Parse JSON
request(123, ["response.content"], expr="json.loads(data['response']['content']['text'])['users']")

# Decode JWT
request(123, ["response.content"], expr="jwt.decode(data['response']['content']['text'], options={'verify_signature': False})")

# Parse form data
request(123, ["request.postData"], expr="dict(urllib.parse.parse_qsl(data['request']['postData']['text']))")

# Extract with regex
request(123, ["response.content"], expr="re.findall(r'api_key=([^&]+)', data['response']['content']['text'])")
```

## Daemon Management

```bash
webtap                    # Start REPL (auto-starts daemon)
webtap --mcp              # Start as MCP server
webtap --daemon           # Start daemon in foreground
webtap --daemon status    # Show daemon status
webtap --daemon stop      # Stop daemon
```

## Tips

1. **Chrome must run with debugging port**:
   ```bash
   google-chrome --remote-debugging-port=9222
   ```

2. **Row IDs from network()** - Use these with `request()`, `to_model()`, `quicktype()`

3. **Field selection patterns** - `["*"]` for all, `["request.*"]` for request only, `["response.content"]` for body

4. **Filters reduce noise** - Default filters remove ads, tracking, analytics

5. **Expression evaluation** - Pre-imported libraries: json, re, bs4, lxml, jwt, yaml, httpx, urllib, datetime, etc.

6. **Fresh JS scope** - Default prevents redeclaration errors; use `persist=True` for multi-step operations

7. **MCP integration** - All commands work as MCP tools for Claude/LLMs
