How should we architect this?

Three approaches — each with a different build/quality tradeoff. Pick the one that feels right.

A

Build on Playwright + smart compression layer

Use Playwright (the industry-standard browser control library) underneath, but add an intelligent layer on top that compresses pages before they reach the LLM, caches elements so we don't re-read them, and sanitizes content against prompt injection.

Token savings come from how we describe the page to the LLM — not raw HTML, but a compact semantic summary. Playwright handles the actual browser mechanics.

Pros
Fastest to build (4–6 weeks)
Proven browser control
Lowest risk
Cons
Shadow DOM partially blind
Can't go fully CDP-native later without rewrite
B

CDP-native core (maximum control)

Talk directly to Chrome's internal protocol (CDP) — the lowest level possible. No intermediary libraries. This gives full access to shadow DOM, precise element targeting, and maximum token efficiency because we control exactly what data flows to the LLM.

This is the approach Stagehand v3 took (and got 44% faster as a result). More engineering work upfront, but the cleanest architecture long-term.

Pros
Maximum token efficiency
Full shadow DOM access
No upstream dependencies
Cons
2–3× more engineering
Re-implements what Playwright gives free
8–12 week build
C

Playwright now, CDP-native later (recommended)

Start with Playwright as the browser engine — ship fast, validate the compression layer, element caching, error recovery, and sanitizer work. Design the internal API so swapping to CDP later is a clean engine replacement, not a rewrite.

The token savings come primarily from the representation layer (how pages are described to the LLM), not the browser layer. So we can get 80% of the token efficiency from day one while keeping build risk low.

Pros
Ship in 4–6 weeks
80% token efficiency from day 1
Upgrade path to CDP in v2
Cons
Shadow DOM partially limited in v1
Requires discipline to keep layers clean
My recommendation: C. The biggest token waste is in how the page is described to the LLM — not in the browser layer. We can fix that immediately with Playwright. Shadow DOM support can be upgraded in v2 once we have users and momentum.