Transforms raw browser data into a compact summary before the LLM ever sees it. Element cache means we never re-read the same thing twice. Differential updates send only what changed.
Every page passes through this before reaching the LLM. Strips hidden elements (invisible text that could hijack your agent), detects prompt injection patterns, scopes context to only the relevant part of the page.
Injection guardHidden element filterContext scoping
↓
Layer 3
Structured Error Recovery
Instead of returning a vague error string, this layer classifies exactly what went wrong and suggests how to recover. The LLM gets a structured object: error type, confidence, and recommended next action.
Direct WebSocket connection to Chrome. Full Shadow DOM access. Surgical data extraction — only pulls exactly what we need. Vision fallback for canvas apps (Google Sheets, Figma). Zero intermediary overhead.
Full Shadow DOMVision fallbackSurgical extractionNo overhead
↓
Runs Here
Chrome / Chromium (local or cloud)
Runs across all layers
Observability Trace
Every action is recorded: what was seen, what was decided, what was executed, what the page looked like after. Machine-queryable. Used for automatic retry classification and can be exported for LLM fine-tuning.
Does this architecture look right? Let me know in the terminal and we'll move to naming + core features.