01 / Foundation

browser-recon helps you figure out how to scrape a website.

Tell it what you want to extract, browse the site by hand for a couple of minutes, and it produces a complete plan — the right Python library to use, which headers to send, which cookies you'll need, how fast you can hit the site, and a working starter script.

In plain English

Scraping a website looks easy in tutorials but breaks in production because real sites have anti-bot defences (Cloudflare, Akamai, PerimeterX) and dozens of hidden requirements (specific headers, cookies, throttling) the developer can't see. browser-recon watches you use the site in a real browser, then tells you exactly what your scraper needs to do — without you having to guess.

What it does, step by step

  1. You run a command: recon scan https://example.com
  2. Chrome launches on your machine and the tool starts watching every request the browser makes.
  3. You browse the site normally — click around, do a search, view a product, whatever you'd want to scrape. Takes a couple of minutes.
  4. You press Ctrl+C and the tool uploads everything it saw to its server.
  5. The server figures out everything — what anti-bot system is in front of the site, which requests are actually load-bearing, which library works (it tries them all), and what'll cost you to run a scraper at scale.
  6. You get a report URL. Open it in your browser — you have a complete scraping plan.

What it's not

It's not a scraper itself. It produces the plan for a scraper — library choice, headers, cookies, delay between requests, plus a starter Python script. You (or your AI assistant) then write the actual scraper using that plan.

Two halves of the system

The CLI (client)

CHROME · CDP · UPLOAD · POLL

Installed via pipx install browser-recon. Launches Chrome, monitors network traffic, prompts for user intent, uploads the capture to the server, polls for progress, opens the final report. No proprietary logic.

CLI guide

The server (browser_recon_server)

FASTAPI · POSTGRES · S3 · LLMS

Receives uploaded captures, runs the full processing pipeline (detection → analysis → intent filter → validation → scrub → synthesis → render), persists everything to Postgres + S3, serves the rendered HTML report and the admin dashboard.

Architecture

Why it works better than guessing

Most scrapers fail in production because the developer guessed wrong about three things: which anti-bot system is in front of the site, which headers the request actually needs, and whether the scraper's IP address looks "real enough" to the target. browser-recon answers all three by measuring, not guessing. It actually fires test requests through real proxies and watches which combinations succeed — then writes the recommendation based on what worked.

Glossary

Terms used throughout this guide. Read this once and you'll understand every page.

TermIn plain English
ScanOne full run of the tool — from typing recon scan <url> to opening the final report.
CaptureEverything the browser saw during your session: every request it made, every response it got, every cookie, every header. Like a recording.
EndpointA specific URL the site uses — e.g. /api/products/123. A typical site has 30–80 unique endpoints during a normal browse session.
Bucket A / B / CThe tool sorts endpoints into three groups. A = the data you actually want (product details, search results). B = prerequisite calls (session bootstrap, anti-bot challenge). C = noise (analytics, ad tracking, telemetry).
Anti-botSoftware running in front of a website (Cloudflare, Akamai, PerimeterX) that tries to detect and block scrapers. Different vendors use different tricks; the tool identifies which one is in play.
ProxyA relay server that forwards your requests through a different IP address. Two flavours: datacenter (cheap, $2/GB, IPs are obviously cloud-hosted) and residential (expensive, $3/GB, IPs look like a real home internet user).
ValidationThe tool actively tests the proposed scraping approach by firing real requests through real proxies. Confirms what actually works before recommending it.
SynthesisThe final AI call that writes up the recommendation, the verdict, and the starter Python script. Uses Claude Sonnet.
LLM"Large Language Model" — the AI that does the judgement-heavy parts (sorting endpoints, writing the recommendation). The tool uses Claude Sonnet for the important calls and Grok-3-mini for the cheap ones.
CLI"Command-Line Interface" — the recon command the customer types in their terminal.
ServerThe browser-recon backend running on Render. Does all the processing. Customers never see it directly except through the report URL.

Who this guide is for

This guide is intentionally accessible. If you're a developer, every component page has full technical detail (file paths, code, SQL). If you're a non-technical reader (founder, designer, support engineer, investor), every page starts with a "In plain English" callout that explains the component without jargon. Skim those callouts and you'll know what every part of the system does.

Current state vs planned

Heads up

This guide describes the system as it will be after T53 ships (the "thin-CLI" architecture, v0.3.0). A few pieces are still in flight:

  • Detection / Analysis / Validation / Scrubber are described as server-side. Pre-T53 they actually run on the CLI. The file paths shown (e.g. browser_recon_server/validation_server/) are post-T53; pre-T53 paths are browser_recon/validation/.
  • Proxy credentials are described as living in AWS Secrets Manager. Pre-T53 they live in Render env vars.
  • scan_events polling + live CLI spinner are described as working. Pre-T53.1 the CLI just sits silent for 60–120 s.
  • PyPI distribution as v0.3.0 hasn't shipped yet — pre-T53.8 the CLI is installed from source.

Each affected page calls out the transition where relevant. T53 spec is at documentation/T53-thin-cli-server-side-pipeline.html.

02 / Foundation

Components at a glance.

Eleven moving parts make up the system. Most run on the server; only the browser-capture piece runs on the user's machine.

In plain English

Think of a scan as an assembly line. The CLI captures the browser session and hands it off; then on the server, the work moves through 8 stations — detect what's protecting the site, organise the requests, sort them by usefulness, actually test what works, remove secrets, write the recommendation, write the auxiliary notes, render the final report. Each station has its own page below.

Pipeline components

Listed in the order they fire during a scan. Click any card to jump to its detail page.

Browser capture

CLIENT · CHROME + CDP · ~30–120 s

Launches Chrome, monitors network traffic via CDP. Captures every request/response/cookie/header during the user's browse session. The only piece that must run locally.

Detail

Detection

SERVER · DETERMINISTIC · ~50 ms

Fingerprints the anti-bot stack from the capture. Matches cookies (_abck → Akamai, __cf_bm → Cloudflare) and headers (server: cloudflare) against a rules table. No LLM.

Detail

Analysis

SERVER · DETERMINISTIC · ~200 ms

Templates URLs (/api/users/123/api/users/<id>), extracts response shapes, infers dependency chains and pagination patterns. No LLM.

Detail

Intent filter

SERVER · CLAUDE SONNET · ~21 s

Classifies endpoints into Bucket A (the data), Bucket B (prerequisites), Bucket C (noise) based on user intent text. First LLM call in the pipeline.

Detail

Validation

SERVER · HTTP + PROXIES · ~25 s

Actively fires test requests against the target site through operator proxies. Library × proxy cascade, header reduction, cookie scenarios, rate-limit probe. The T51 work.

Detail

Scrubber

SERVER · DETERMINISTIC · ~50 ms

Strips secrets (cookie values, Authorization tokens, JWT-shaped strings) from the in-memory capture before persistence + downstream LLM calls.

Detail

Synthesis

SERVER · CLAUDE SONNET · ~30 s

One combined call producing the recommendation, the verdict, and the starter Python script. The most expensive LLM call (~$0.10–$0.15 per scan).

Detail

Auxiliary prompts

SERVER · GROK-3-MINI · ~12 s

Two cheap parallel calls — notes (5 short pointers) + difficulty_drivers (ranked obstacles). Runs simultaneously with synthesis.

Detail

Report renderer

SERVER · JINJA · ~30 ms

Reconstructs typed objects from the persisted JSONB and renders the final HTML report through Jinja templates. No LLM, no DB writes besides finalization.

Detail

Scan events

BOTH · NEW IN T53

Append-only event log of pipeline step transitions, polled by the CLI every 1.5s for the live progress spinner. Replaces silent waiting.

Detail

Eval system

SERVER · ADMIN ONLY · T48

Re-runs scan synthesis through alternate LLM providers (OpenAI, xAI) for quality comparison. Persisted to llm_evals with a matrix UI.

Detail

Cost breakdown per scan

StepProviderAvg costNotes
intent_filterClaude Sonnet 4.6$0.030Judgement-heavy; cheap models over-prune Bucket B.
scan_synthesisClaude Sonnet 4.6$0.100Combined recommendation + verdict + starter_code in one call.
notesGrok-3-mini$0.004Parallel.
difficulty_driversGrok-3-mini$0.004Parallel.
intent_clarifierGrok-3-mini$0.0003Only runs if intent is ambiguous.
flow_confirmGrok-3-mini$0.003Summarises the captured action sequence.
Total~$0.14Down from $0.38 pre-T51.8.
03 / Foundation

Architectural map.

Three actors: the customer's computer, the browser-recon server (hosted on Render), and AWS for storage and secrets. The server does most of the work.

In plain English

The customer's CLI is a lightweight thing that knows how to drive Chrome and talk to a server. The server is where all the smart stuff lives — the AI models, the validation logic, the report generator. AWS sits next to the server holding two things: raw captures (in S3) and proxy credentials (in Secrets Manager).

CLIENT MACHINE Chrome (CDP) CAPTURE recon CLI CAPTURE · UPLOAD · POLL rich spinner LIVE PROGRESS ~2 MB install no secrets locally no proprietary code POST /capture GET /events HTTPS · TLS 1.3 SERVER (RENDER) FastAPI Background task detection analysis intent_filter (LLM) validation (HTTP + proxies) scrub synthesis + notes + drivers (LLM) render (Jinja) SHARED INFRA FastAPI + Uvicorn SQLAlchemy + Alembic Jinja templates boto3 AWS S3 + KMS RAW CAPTURES · LLM DUMPS Secrets Manager PROXY CREDS EXTERNAL APIS api.anthropic.com api.openai.com api.x.ai (Grok) Proxy vendor DATABASE Postgres (Render managed)
FIG. 1 — Three actors: customer machine, Render server, AWS. The CLI knows only the wire protocol.

Boundaries that matter

  • Client ↔ server: exactly two endpoints — POST /scans/<id>/capture and GET /scans/<id>/events. All traffic is HTTPS bearer-authed.
  • Server ↔ Postgres: single managed instance on Render. All persistent state lives here: scans, reports, llm_calls, llm_evals, validation_runs, scan_events, users, audit_log.
  • Server ↔ S3: raw captures (KMS-encrypted) and prompt/response dumps for every LLM call (for replay + audit).
  • Server ↔ Secrets Manager: proxy URLs only. Never logged, never persisted to Postgres.
  • Server ↔ external LLMs: Anthropic, OpenAI, xAI. Each has its own API key in env (server-side only).
  • Server ↔ proxy vendor: all validation HTTP traffic exits through the configured proxy URLs.

Deployment model

Single Render Web Service running uvicorn browser_recon_server.main:app. Auto-scales based on request rate. Postgres is Render-managed (encrypted disk by default). S3 + Secrets Manager are AWS-side. No worker queue today — long-running pipeline tasks use FastAPI's BackgroundTasks within the same process. If we need to fan out under load, the natural next step is a Celery/RQ worker reading from a Redis queue.

04 / Foundation

Lifecycle of a scan.

Fourteen ordered steps, from recon scan in the terminal to a report URL in the browser. Steps 1–5 happen on the user's machine; 6–14 happen on the server.

In plain English

Read this page top to bottom and you'll know exactly what happens during a scan — when the browser opens, when the AI runs, when the report appears, and what each stage produces. Each step is one or two sentences.

recon scan <url> — user invokes the CLI. CLI prompts for a template (products / stocks / reviews / etc.) and asks the user to describe what data they want to scrape.
Intent processing — CLI POSTs the typed intent text to POST /scans. Server runs intent_clarifier (Grok-3-mini) to decide if clarifying questions are needed; returns the refined intent.
Browser launch — CLI launches Chrome with CDP enabled on port 9222. Connects to the WebSocket, enables Network, Page, Runtime domains.
Browsing + capture — user browses the target site. CLI records every request, response, cookie, header, response body. Detected anti-bot signals (Akamai, PerimeterX) emit live ! ANTI-BOT: lines.
Ctrl+C → flow confirm — capture ends. CLI POSTs the action sequence to flow_confirm (Grok-3-mini) which summarises it. User confirms or refines.
Upload — CLI POSTs the full capture to POST /scans/<id>/capture. Server stores the raw blob in S3, returns 202 + status_url. Cookies and Authorization headers are not yet scrubbed — validation needs them.
Detection (server) — anti-bot fingerprinting. Cookie patterns, response headers, URL patterns matched against the rules table. Emits structured findings.
Analysis (server) — URL templating, response shape extraction, dependency-chain inference, pagination detection, auth signal extraction.
Intent filter (server, LLM) — Claude Sonnet classifies each captured endpoint into Bucket A / B / C based on the user's intent.
Validation (server) — for each Bucket A endpoint, the T51 cascade fires real HTTP requests through proxies: library compare → header reduction → cookie scenarios → rate-limit probe. Bandwidth tracked.
Scrub (server) — secrets removed from the in-memory capture. The S3 blob is rewritten to its scrubbed form; downstream LLM calls only see scrubbed data.
Synthesis (server, LLM) — Claude Sonnet produces the recommendation, the verdict, and the starter Python script in one combined call.
Notes + difficulty drivers (server, LLM, parallel with synthesis) — Grok-3-mini produces 5 short pointers + a ranked obstacle list.
Render + finalize (server) — Jinja templates render the full HTML report. Scan row marked complete. The CLI's polling loop sees render: complete, prints the report URL, exits.

Status transitions

The scans.status column tracks the scan's lifecycle:

StatusWhenNext
draftRight after POST /scans (intent captured, browser not yet launched)capturing
capturingChrome running, CDP listeningprocessing
processingCapture uploaded; pipeline running server-sidecomplete | errored
completeReport rendered, scan finalized(terminal)
erroredPipeline failed; error_message populated(terminal)
05 / Foundation

Database schema.

Eight tables in Postgres hold every piece of persistent state — users, scans, AI calls, validation results, audit logs.

In plain English

Every scan, every AI call, every test request leaves a row in the database so we can debug later. Most data is stored as JSON inside a column rather than a normalised set of columns — that lets us evolve the pipeline without constantly migrating the schema. This page is the most technical in the guide; non-tech readers can safely skim the table summary at the top and skip the SQL.

Core tables

TablePurposeAppend-only?Key columns
usersCustomer accounts + roleNo (UPDATE on login)id, email, role, created_at
api_keysrec_live_… bearer tokensYesid, user_id, key_hash, last_used_at, revoked_at
scansOne row per scan. Stores capture-derived JSONB + final pipeline outputs.Noid, user_id, target_url, status, intent_text, bucket_assignment, synthesis, completed_at
reportsRenderable report row, joined to scans. Carries pre-rendered fields the dashboard query needs.Noid, scan_id, slug, verdict, recommendation, created_at
llm_callsOne row per LLM API call (Anthropic / OpenAI / xAI). For audit + cost rollup + debug.Yesid, scan_id, prompt_name, model, provider, cost_usd, input_tokens, output_tokens, s3_prompt_path, s3_response_path
llm_evalsT48 — eval runs (re-running synthesis with alternate providers). Append-only history of every re-run.Yesid, scan_id, prompt_name, model, provider, status, response_body, cost_usd
validation_runsT51.7 — one row per (scan_id, endpoint_id) per validation run. Carries validation_data + slim llm_view + report_view.Yesid, scan_id, endpoint_id, status, validation_data, llm_view, report_view
scan_eventsT53.1 — pipeline step events for the CLI polling loop.Yesid, scan_id, step, status, message, created_at
audit_logAdmin actions, role changes, soft-deletesYesid, actor_id, action, target_type, target_id, created_at

JSONB columns on scans

Most of the scan's processed state lives in JSONB columns on the scans row, rather than in normalised tables. This keeps the schema flexible as the pipeline evolves.

ColumnSet byShape
capture_metadataStep 6 (upload){captured_at, duration_ms, request_count, ...}
findingsStep 7 (detection)[{kind, vendor, severity_tier, evidence: [...]}]
analysis_outputStep 8 (analysis){endpoints: [...], framework_hints, flow_timeline, cors_summary, ...}
bucket_assignmentStep 9 (intent filter){bucket_a: [ep_ids], bucket_b: [...], bucket_c: [...], rationale}
validationStep 10 (validation){per_endpoint: [...], per_endpoint_llm_view, per_endpoint_report_view, scan_validation_summary}
synthesisStep 12 (synthesis){recommendation: {...}, verdict: {...}, starter_code: {...}}
notesStep 13a{notes: [str, ...]}
difficulty_driversStep 13b{drivers: [{label, severity, explanation, evidence_refs}]}

Querying conventions

-- Find recent failed scans:
SELECT id, target_url, status, error_message, created_at
FROM scans
WHERE status = 'errored'
ORDER BY created_at DESC
LIMIT 20;

-- Find the latest validation_runs row per endpoint (T51.7 append-only):
SELECT DISTINCT ON (scan_id, endpoint_id) *
FROM validation_runs
WHERE scan_id = $1 AND status = 'complete'
ORDER BY scan_id, endpoint_id, created_at DESC;

-- Find a scan's event timeline (T53.1):
SELECT step, status, message, created_at
FROM scan_events
WHERE scan_id = $1
ORDER BY created_at ASC;

-- LLM cost rollup per scan:
SELECT scan_id, SUM(cost_usd) AS total_usd
FROM llm_calls
WHERE created_at > now() - interval '7 days'
GROUP BY scan_id
ORDER BY total_usd DESC;

Migrations

Run via Alembic. Every schema change ships a migration in alembic/versions/. To apply on a new environment: rye run alembic upgrade head. To downgrade one: rye run alembic downgrade -1.

Recent migrations:

  • f7a3c2d4e5b6_t51_7_add_validation_runs.py — append-only validation_runs table
  • c1d2e3f4a5b6_add_llm_evals.py — T48 eval persistence
  • (T53.1) add_scan_events.py — pipeline event log

Browser capture.

Launches Chrome on the customer's machine and records every network call via the Chrome DevTools Protocol. The only piece that must run client-side.

CLIENT DETERMINISTIC ~30–120 SECONDS USER-DRIVEN

In plain English

Opens a real Chrome window on the user's computer and records every single network request the browser makes while the user clicks around. Like a wire-tap on Chrome, but only for the duration of the scan.

What it does

Launches Chrome with --remote-debugging-port=9222 on a fresh, isolated user-data-dir (no extensions, no profile pollution). Connects to the CDP WebSocket, enables Network/Page/Runtime domains, and starts logging every Network.requestWillBeSent + Network.responseReceived + Network.dataReceived event into an in-memory blob. Captures cookies via Network.getAllCookies on capture end.

Real-time anti-bot detection

While capturing, the CLI runs a light detection pass on incoming responses and prints live alerts:

! ANTI-BOT: Imperva / Incapsula detected -- Header 'x-cdn' present
! ANTI-BOT: PerimeterX detected -- URL contains 'px-cloud.net'
! ANTI-BOT: Akamai Bot Manager detected -- URL contains '/akam/'

This is informational only — the authoritative detection step runs server-side after upload. The CLI version gives the user instant feedback so they understand what they're scraping.

Where it lives

Package
browser_recon/capture/
Entry
browser_recon/capture/chrome_launcher.py — finds Chrome, launches it
CDP loop
browser_recon/capture/monitor.py — WebSocket event handler
Anti-bot live
browser_recon/capture/anti_bot_live.py
Output
In-memory dict serialised to JSON on capture end

Inputs / outputs

  • Input: target URL, intent text, optional starter template.
  • Output: a capture blob — list of requests/responses with full headers, response bodies, cookies, timestamps, plus user-action breadcrumbs (clicks, inputs, form submits).

Database impact

The CLI does not write to the database directly. The capture blob is uploaded to the server in step 6, which stores it in S3 (KMS-encrypted) and creates a row in scans with the S3 key.

How to test it

# Run the CLI against a public test site:
recon scan https://books.toscrape.com

# Capture lasts as long as you browse. Ctrl+C to stop.
# Output dir gets a recon_capture.json with the full blob.

# Unit tests for the capture layer:
rye run pytest tests/unit/test_chrome_launcher.py
rye run pytest tests/unit/test_capture_monitor.py

Find output in the DB after upload

SELECT id, status, target_url, capture_metadata
FROM scans
WHERE id = '<scan_id>';

-- The raw capture is at the S3 key returned in the response:
SELECT s3_capture_path FROM scans WHERE id = '<scan_id>';

Known gotchas

  • Chrome must be installed on the customer's machine. The launcher searches the standard install paths per OS; if missing, the CLI fails fast with an install hint.
  • Cookies captured during the browse session are bound to the customer's residential IP. When validation later fires through a different proxy IP, those cookies may be invalid (PerimeterX/Akamai bind _pxhd/_abck to IP).
  • Capture cookies aren't scrubbed on the CLI side post-T53 — they travel to the server unscrubbed because validation needs them. They're scrubbed server-side after step 10.

Anti-bot detection.

Fingerprints which anti-bot vendors are protecting the target site. Pure pattern-matching, no LLM. Runs in tens of milliseconds.

SERVER DETERMINISTIC ~50 MS

In plain English

Figures out which anti-scraping system (Cloudflare, Akamai, PerimeterX, DataDome, etc.) is in front of the target site by looking at signature cookies and headers — like a fingerprint scanner for bot-protection vendors. No AI involved; just pattern matching.

What it does

Walks the captured requests and responses, scoring them against a rules table:

  • Cookies_abck → Akamai Bot Manager · _pxhd/_pxvid → PerimeterX · __cf_bm/cf_clearance → Cloudflare · datadome → DataDome · incap_ses → Imperva · visid_incap → Imperva.
  • Response headersserver: cloudflare, x-cdn: incapsula, x-akamai-transformed, cf-ray, x-datadome.
  • URL patterns — paths under /akam/, hosts ending in .px-cloud.net, .datadome.co, recaptcha/api2/.
  • JavaScript fingerprints — script tags loaded from known anti-bot CDNs.

Each match contributes evidence with a confidence weight. Findings are aggregated per vendor and emitted with a severity tier (high/medium/low) reflecting how restrictive that vendor typically is in production scraping.

Where it lives

Package
browser_recon_server/detection_server/ (post-T53) or browser_recon/detection/ (pre-T53)
Entry
detection.runner.run_detection(capture) -> DetectionResult
Rules
detection/rules/anti_bot.py, detection/rules/auth.py, detection/rules/cors.py
Aggregation
detection/aggregator.py — combines per-rule matches into per-vendor findings

Output shape

{
  "findings": [
    {
      "kind": "anti_bot",
      "vendor": "Akamai Bot Manager",
      "severity_tier": "high",
      "severity_label": "high",
      "confidence": 0.85,
      "evidence": [
        {"signal": "cookie", "value": "_abck", "confidence_weight": 0.3, ...},
        {"signal": "url_pattern", "value": "path: /akam/", ...}
      ],
      "per_endpoint": {"https://...": "presence detected", ...}
    }
  ]
}

Database impact

  • Writes to scans.findings (JSONB column)
  • Emits a scan_events row at start + complete (T53.1)

How to test it

rye run pytest tests/unit/test_detection_anti_bot.py
rye run pytest tests/unit/test_detection_runner.py

Find output in the DB

-- All findings for a scan, sorted by severity:
SELECT jsonb_path_query(findings, '$[*]') AS finding
FROM scans
WHERE id = '<scan_id>';

-- Count scans by primary anti-bot vendor:
SELECT
  jsonb_path_query_first(findings, '$[*] ? (@.kind == "anti_bot") .vendor') AS vendor,
  count(*)
FROM scans
WHERE status = 'complete'
GROUP BY vendor;

Adding a new detection rule

  1. Open detection/rules/anti_bot.py (or the appropriate kind file).
  2. Add a Rule(...) entry with vendor name, signal type, pattern, confidence weight.
  3. Add a unit test in tests/unit/test_detection_anti_bot.py using a captured sample of the signal.
  4. Run the test suite to confirm no regression in scoring on existing scans.

Endpoint analysis.

Structures the raw capture into a typed endpoint inventory with templated URLs, response shapes, dependency edges, pagination patterns, and auth signals.

SERVER DETERMINISTIC ~200 MS

In plain English

Takes the raw captured requests and organises them into a tidy list. Spots that /api/users/123 and /api/users/456 are the same endpoint with different IDs. Notes which calls hand off data to other calls (like a search returning IDs that a detail call then fetches). No AI involved.

What it does

Five sub-passes, all deterministic:

  1. URL templating — collapses /api/users/12345 and /api/users/67890 into /api/users/<id> with two example values. Recognises numeric ids, UUIDs, long hex slugs.
  2. Response shape extraction — for each endpoint, summarises the JSON shape (key list, array depth, response_size in bytes).
  3. Dependency chain inference — scans request bodies for values that appeared in earlier response bodies. If endpoint B's request contains a token A's response surfaced, record edge A → B.
  4. Pagination detection — recognises ?page=N, ?cursor=…, ?offset=N&limit=M patterns across URL families.
  5. Auth signal extraction — flags endpoints that carry Authorization, require cookies, or send/receive CSRF tokens.

Where it lives

Package
browser_recon_server/analysis_server/ (post-T53) or browser_recon/analysis/
Entry
analysis.runner.run_analysis(capture, detection) -> AnalysisResult
Templating
analysis/url_template.py
Bucket signals
analysis/bucket_signals.py — auth + dependency signals
Pagination
analysis/pagination.py

Output shape

{
  "endpoints": [
    {
      "id": "ep_017",
      "url_template": "/api/products/<id>",
      "method": "GET",
      "hostname": "example.com",
      "observed_count": 5,
      "response_shape": {"shape": "object{name,price,sku}", "size_bytes": 54000},
      "required_cookies": ["_abck", "session_id"],
      "required_header_values": {"Referer": "https://example.com/products"},
      ...
    }
  ],
  "framework_hints": [{"name": "Next.js", "evidence": "_next/api path"}],
  "flow_timeline": [{"step_type": "click", "target": "button#search", ...}],
  "cors_summary": [...]
}

Database impact

  • Writes to scans.analysis_output (JSONB)
  • The endpoints list is the universe Bucket A/B/C classification operates on in step 9
  • Emits start + complete events in scan_events

How to test it

rye run pytest tests/unit/test_analysis_runner.py
rye run pytest tests/unit/test_url_template.py
rye run pytest tests/unit/test_bucket_signals.py

Find output in the DB

-- Endpoint inventory for a scan:
SELECT jsonb_array_length(analysis_output->'endpoints') AS n_endpoints
FROM scans WHERE id = '<scan_id>';

-- Endpoints with required cookies:
SELECT ep
FROM scans, jsonb_array_elements(analysis_output->'endpoints') AS ep
WHERE id = '<scan_id>'
  AND jsonb_array_length(ep->'required_cookies') > 0;

Intent filter.

First LLM call. Classifies captured endpoints into Bucket A (the data), Bucket B (prerequisites), Bucket C (noise) based on the user's typed intent.

SERVER CLAUDE SONNET 4.6 ~21 SECONDS ~$0.03 / SCAN

In plain English

A real website makes dozens of network calls during a browse session — most are tracking pixels, analytics, ad-tech junk. This step asks an AI to read the user's goal ("I want product data and reviews") and sort the captured calls into three piles: useful data, setup calls we need first, and noise we can ignore.

What it does

Most captures contain 30–80 endpoints. Most are noise — analytics pixels, telemetry beacons, third-party tag managers. The intent filter sends the user's intent text + the endpoint inventory to Claude Sonnet with a structured-output prompt, and gets back three bucketed lists:

  • Bucket A — the data the user actually wants (product detail, search results, reviews).
  • Bucket B — prerequisites the scraper must hit first (session bootstrap, config endpoints, anti-bot init).
  • Bucket C — pure noise (Adobe Analytics pixels, Bloomreach, Criteo retargeting).

Why Sonnet

Judgement-heavy task. A getReviews endpoint might look like noise to a cheap model but it's load-bearing when the user said "scrape reviews." Cheaper models also over-prune Bucket B (the T48 eval showed Grok-3-mini and gpt-5.4-nano dropping checkout/payment session endpoints scrapers actually need). The cost overhead for Sonnet over Grok-3-mini is ~$0.025/scan — minor compared to the cost of shipping a broken scraper.

Where it lives

Prompt
browser_recon_server/prompts/intent_filter.py
Output schema
FilterOutput Pydantic model
Caller
browser_recon_server/scan_pipeline.py:_run_intent_filter

Output shape

{
  "bucket_a": ["ep_012", "ep_019", "ep_020", ...],
  "bucket_b": ["ep_003", "ep_004", "ep_005", ...],
  "bucket_c": ["ep_001", "ep_002", ...],
  "rationale": "Bucket A contains the core data endpoints...",
  "confidence_entries": [{"endpoint_id": "ep_012", "confidence": 0.92}, ...]
}

Database impact

  • Writes to scans.bucket_assignment (JSONB)
  • One row in llm_calls with cost, tokens, S3 prompt+response keys
  • Events emitted in scan_events

How to test it

rye run pytest tests/unit/test_intent_filter_prompt.py

# Re-run on an existing scan (admin only):
recon llm-eval --scan-id <scan_id> --prompts intent_filter --provider-b claude-sonnet-4-6 --force

Find output in the DB

-- Bucket assignment for a scan:
SELECT bucket_assignment FROM scans WHERE id = '<scan_id>';

-- Find the LLM call row:
SELECT model, cost_usd, input_tokens, output_tokens, s3_response_path
FROM llm_calls
WHERE scan_id = '<scan_id>' AND prompt_name = 'intent_filter';

Active validation.

The T51 work. Fires real HTTP requests through operator-owned proxies to confirm which library + proxy combo actually works for each Bucket A endpoint.

SERVER HTTP + PROXIES ~25 SECONDS SEE DOCUMENTATION/T51-VALIDATION-REDESIGN.HTML

In plain English

The most important step. Instead of guessing whether the scraper will work, we actually try it. For each useful endpoint, we fire test requests using different Python libraries (requests, httpx, curl_cffi, cloudscraper) through different proxy types (datacenter and residential), and see which combination gets a real response. We also figure out the minimum set of headers needed and how fast we can hit the site before it gets angry. The result is grounded in reality, not in what the AI guessed.

What it does

For each Bucket A endpoint, walks a four-step cascade:

  1. Library comparison — fans out 3 libraries (requests, httpx, curl_cffi) × 2 proxy tiers (datacenter, residential) in one parallel ThreadPoolExecutor wave. If all 6 attempts block, falls back to cloudscraper as Phase B. Picks the winner by preference order (simplest library that works).
  2. Header reduction — tier list skips Accept-Encoding, sec-fetch-*, sec-ch-ua-* without testing; remaining headers tested in parallel waves with a 2-axis (library × proxy) cascade on failures, capped at 6 attempts per header.
  3. Cookie dependency — three scenarios (cold / warmup / full) fired in parallel with the same cascade for failures.
  4. Rate-limit probe — 3 rounds × 5 requests at delays [2.0s, 0.5s, 0.2s] with the winning library/proxy. Tags result with proxy_rotation_mode so synthesis treats rotating-residential measurements as a lower bound.

For each Bucket B endpoint: just one verification request with the scan-level winner. Mini fan-out only on failure (no header/cookie/rate-limit probing on B).

Where it lives

Package
browser_recon_server/validation_server/ (post-T53) or browser_recon/validation/
Orchestrator
validation/__init__.py::validate(...)
Library cascade
validation/library_compare.py
Header reduction
validation/header_reduce.py
Cookie probe
validation/cookie_dependency.py
Rate-limit
validation/rate_limit_probe.py
Cascade helpers
validation/_cascade.py (T51.5)

Proxy credentials

Read from AWS Secrets Manager (post-T53.5) or server env vars (pre-T53). Never from customer's environment. Customer's shell stays clean of operator secrets.

Bandwidth tracking

Every HTTP attempt records request_bytes + response_bytes. Rolled up to a bandwidth_summary per endpoint and used to compute per_1k_requests_usd from the configured per-tier $/GB rate:

cost_per_1k = (avg_req_bytes + avg_resp_bytes) * 1000 / 1e9 * rate_usd_per_gb

# Datacenter: $2.00/GB · Residential: $3.00/GB

Output shape

Three projections per endpoint, all persisted:

  • validation_data — full attempt log, source of truth. Every library × proxy attempt, every header probe, every rate-limit round.
  • llm_view — ~250 token compact projection for the synthesis prompt (was ~4,600 tokens pre-T51.8).
  • report_view — medium-detail projection for the Jinja Validation section.

Database impact

  • Writes one row per endpoint to validation_runs (append-only — re-runs insert new rows)
  • Writes to scans.validation for the orchestrator's typed reconstruction
  • Emits scan_events rows during long validation passes (e.g. "Validating 2/4 Bucket A endpoints")

How to test it

rye run pytest tests/unit/test_library_compare.py
rye run pytest tests/unit/test_header_reduce.py
rye run pytest tests/unit/test_cookie_dependency.py
rye run pytest tests/unit/test_rate_limit_probe.py
rye run pytest tests/unit/test_validation_orchestrator.py
rye run pytest tests/integration/test_two_round_validation.py

Find output in the DB

-- All validation rows for a scan:
SELECT endpoint_id, status, duration_ms,
       validation_data->>'best_library' AS best_lib,
       validation_data->'bandwidth_summary'->>'per_1k_requests_usd' AS cost
FROM validation_runs
WHERE scan_id = '<scan_id>'
ORDER BY created_at;

-- LLM-view projection (the slim shape synthesis sees):
SELECT llm_view FROM validation_runs
WHERE scan_id = '<scan_id>' AND endpoint_id = 'ep_017';

Known limits

  • Cookies captured locally are bound to the customer's residential IP; validation through a different proxy may trigger 412 challenges (we see this on PerimeterX-protected sites). The recommendation surfaces "cookie warmup required" when this is the case.
  • Rate-limit probe through rotating-residential proxies fires from a different IP per request, so the target physically cannot rate-limit. The result carries proxy_rotation_mode: "rotating" and a caveat string; the synthesis prompt is instructed to recommend ≥1.5 s delay regardless of measured value.
  • Spot-check semantics: ~100 requests over ~30 s ≠ 50,000 requests/hour from a sticky IP. The report copy says "spot-check, not production-scale."

Secret scrubber.

Removes confidential values from the in-memory capture after validation, before persistence and downstream LLM calls. Secrets only ever live in server RAM transiently.

SERVER DETERMINISTIC ~50 MS

In plain English

Removes the user's secrets (cookie values, login tokens) from the captured data before saving it long-term or feeding it to the AI. Keeps the names ("you'll need an _abck cookie") so the recommendation is still useful, but replaces the actual values with <scrubbed> placeholders. Secrets only live in server memory for the few seconds validation needs them.

What it does

FieldTreatment
Cookie: header valueKeep cookie names, replace values with <scrubbed>
Set-Cookie response headerKeep name + attributes (path, samesite), strip value
Authorization: Bearer …Replace value with <scrubbed>
X-Api-Key, X-Auth-Token, custom auth-shaped headersSame
Request bodies containing tokens (regex / known patterns)Replace matched values with <scrubbed>
Response bodies containing JWT-shaped stringsSame

What stays

  • Cookie / header names (the synthesis prompt needs to tell the user "your scraper will need a _abck cookie")
  • Non-sensitive headers (Accept-Language, Referer, Content-Type)
  • Bucket A response shapes (so synthesis can write data = r.json()["products"])
  • URL paths + query parameters (rare for secrets to live there; if regex finds tokens we scrub)

Where it lives

Module
browser_recon_server/scrubber.py (post-T53) or browser_recon/scrubber.py
Entry
scrub_capture(blob) -> scrubbed_blob
Rules
scrubber/patterns.py — regex library for token detection

When it runs

Step 11 of the pipeline — after validation, before synthesis. This ordering is critical: validation needs the real cookies to test cookie-warmup scenarios; synthesis only needs names + shapes. The S3 capture blob is rewritten in-place after scrubbing so secrets never persist long-term.

Database impact

  • Rewrites the capture blob at the same S3 key (KMS-encrypted; old version is the unscrubbed pre-validation copy)
  • Sets scans.scrubbed_at timestamp
  • All subsequent JSONB writes (synthesis input, report_view, llm_view) contain only scrubbed data

How to test it

rye run pytest tests/unit/test_scrubber.py
rye run pytest tests/unit/test_scrubber_patterns.py

Confirm scrubbing worked

# Pull the persisted capture from S3 and grep for secret patterns:
aws s3 cp s3://browser-recon-prod/captures/<scan_id>.json.gz - \
  | gunzip \
  | grep -E '"value": "(eyJ|Bearer |[a-f0-9]{32})'

# Should return nothing for a scrubbed capture.

Scan synthesis.

The main LLM call. Produces the recommendation, the verdict, and the runnable starter script in one combined Claude Sonnet 4.6 invocation.

SERVER CLAUDE SONNET 4.6 ~30 SECONDS ~$0.10 / SCAN

In plain English

The final, headline AI call. Takes everything we've learned (detection + organisation + validation results) and writes three things in one go: the recommendation ("use curl_cffi with chrome120 impersonation through residential proxies"), a 2-sentence verdict ("Walmart is protected by Akamai + PerimeterX, expect $0.40–$2.00 per 1k requests"), and a working starter script the user can run as-is.

What it does

One scan_synthesis call (combined prompt from T16) that gets fed:

  • The user's intent + Bucket A/B endpoint inventory (filtered, scrubbed)
  • Detection findings (anti-bot vendors with severity tier)
  • The compact validation slot from step 10 (library_matrix, best library/proxy, min_required_headers, cookie scenarios, rate-limit, bandwidth cost)
  • Target identity (URL, domain)

Returns three structured sub-outputs in a single response:

  1. recommendation{primary_library, impersonation, proxy_type, confidence, cost_band, rationale}. The actionable verdict.
  2. verdict — 2–3 sentence plain-English headline at the top of the report.
  3. starter_code — runnable Python script that uses the recommended library + headers + cookies + rate-limit delay from validation. T52.1's "minimum comments" rule keeps it terse.

Why combined into one call

Before T16 this was 3 separate prompts (recommendation, verdict, starter_code) that each re-sent the same ~80k of input. Combining saved roughly two-thirds of the input tokens. T51.8 then cut another ~80% by replacing the bloated validation_results_* payload with the slim llm_view.

Where it lives

Prompt
browser_recon_server/prompts/scan_synthesis.py
System prompt
SYSTEM_PROMPT constant — superset of the three legacy prompts
Output schema
CombinedSynthesisOutput Pydantic model
Orchestrator
browser_recon_server/synthesis_orchestrator.py
Legacy rollback
BROWSER_RECON_USE_LEGACY_SYNTHESIS=1 env flag re-enables the 3-call fan-out

Database impact

  • Writes to scans.synthesis (JSONB, holds all three sub-outputs)
  • Writes scans.primary_recommendation, cost_band_low_usd, cost_band_high_usd, confidence at the top level for dashboard queries
  • One llm_calls row with cost + S3 prompt+response keys

How to test it

rye run pytest tests/unit/test_scan_synthesis_prompt.py
rye run pytest tests/unit/test_synthesis_orchestrator.py

# Re-run on an existing scan via the eval system:
recon llm-eval --scan-id <scan_id> --prompts scan_synthesis --provider-b claude-sonnet-4-6 --force

Find output in the DB

-- Top-level recommendation fields:
SELECT primary_recommendation, cost_band_low_usd, cost_band_high_usd, confidence
FROM scans WHERE id = '<scan_id>';

-- Full synthesis output:
SELECT synthesis FROM scans WHERE id = '<scan_id>';

-- Starter code only:
SELECT synthesis->'starter_code'->>'code'
FROM scans WHERE id = '<scan_id>';

Notes & difficulty drivers.

Two cheap parallel Grok-3-mini calls that produce report sections orthogonal to the recommendation.

SERVER GROK-3-MINI ~12 SECONDS (PARALLEL) ~$0.008 / SCAN COMBINED

In plain English

Two small AI calls that add colour to the report. Notes = "5 things the developer should know that aren't in the recommendation" (e.g. "the API uses GraphQL — you can introspect the schema"). Difficulty drivers = "ranked list of what makes this site hard to scrape." Both run on a cheap fast AI in parallel with the main synthesis call.

What they do

  • notes — 5 short pointers the scraper engineer should know but the recommendation doesn't surface (e.g. "GraphQL introspection appears enabled", "review pagination uses ?page=N", "this endpoint requires X-APOLLO-OPERATION-NAME header"). Renders as bullets in section 02 of the report.
  • difficulty_drivers — ranked list of what makes this site hard to scrape. Each driver has a severity tier (easy / medium / high) and a one-sentence explanation. Drives section 04 of the report.

Parallelism

Run via synthesis_orchestrator.py's thread pool, fanning out simultaneously with scan_synthesis. Combined wall-clock is ~max of the slowest, not sum.

Where they live

Notes
browser_recon_server/prompts/notes.py
Drivers
browser_recon_server/prompts/difficulty_drivers.py
Orchestrator
synthesis_orchestrator.py

Database impact

  • Writes to scans.notes and scans.difficulty_drivers (JSONB)
  • Two llm_calls rows (one per prompt)

Find output in the DB

SELECT
  notes->'notes' AS notes,
  difficulty_drivers->'drivers' AS drivers
FROM scans WHERE id = '<scan_id>';

Report renderer.

Final pipeline stage. Reconstructs typed objects from persisted JSONB and renders the HTML report through Jinja templates.

SERVER JINJA TEMPLATES ~30 MS

In plain English

Takes all the AI outputs + detection findings + validation results and combines them into the HTML page the user opens at https://browser-recon.com/r/<slug>. Like a print template that fills in blanks. No AI involved.

What it does

Reads every JSONB column from the scans row, reconstructs typed dataclasses via browser_recon_server/report_renderer.py, and renders templates/report.html through Jinja. The template orchestrates 8 section partials in order:

SectionPartialSource
01 / Detectionpartials/detection_card.htmlscans.findings
02 / Scraping planpartials/scraping_plan.htmlAnalysis + intent filter
02c / Validation (T51.8)partials/validation_section.htmlvalidation_runs.report_view
03 / Prerequisitespartials/prerequisites.htmlBucket B endpoints
04 / Difficultypartials/difficulty_drivers.htmlscans.difficulty_drivers
05 / Recommendation + costpartials/recommendation_cost.htmlscans.synthesis.recommendation
06 / Starter codepartials/starter_code.htmlscans.synthesis.starter_code
07 / Evidencepartials/evidence_trail.htmlBucket counts
08 / Driftpartials/drift_footer.htmlscans.last_validated_at

Download mode (T52)

The same template renders into a self-contained HTML file when download_mode=True is passed in the context. Static CSS/JS get inlined; dynamic features (Rerun, Feedback, CSRF meta) are stripped. Served via GET /r/<slug>/download with Content-Disposition: attachment.

Where it lives

Renderer
browser_recon_server/report_renderer.py
Top template
templates/report.html
Partials
templates/partials/*.html
Filters
templates/__init__.pycurl_snippet, httpx_snippet Jinja filters

Database impact

  • Updates scans.status = 'complete' and scans.completed_at
  • Updates reports row with the rendered fields the dashboard query needs
  • Final scan_events row: (step='render', status='complete') — this is what unblocks the CLI's polling loop

How to test it

rye run pytest tests/unit/test_report_renderer.py
rye run pytest tests/unit/test_report_templates.py
rye run pytest tests/unit/test_report_partials_t12.py
rye run pytest tests/unit/test_reports_endpoint.py

View a report

# Live (authenticated):
https://browser-recon.com/r/<slug>

# Self-contained download:
https://browser-recon.com/r/<slug>/download

Scan events & polling.

Append-only event log of pipeline step transitions. The CLI polls every 1.5 s and renders a live spinner. Replaces the silent 90-second wait.

CLIENT + SERVER NEW IN T53.1

In plain English

Solves the "is anything happening?" problem. While the server processes a scan (60–120 seconds), the CLI checks in every 1.5 seconds asking "any progress?" The server replies with a timeline of which steps have started and finished. The CLI displays a live spinner so the user sees real progress instead of staring at a frozen terminal.

How it works in one paragraph

Every time the server starts or finishes a step (detection, validation, synthesis, etc.), it writes a row to a scan_events table. The CLI asks the server every 1.5 seconds "what's new since I last asked?" and prints any new rows as live progress. When the final "render complete" event arrives, the CLI stops polling and prints the report URL.

Three pieces

1. The table

CREATE TABLE scan_events (
  id           uuid         PRIMARY KEY DEFAULT gen_random_uuid(),
  scan_id      uuid         NOT NULL REFERENCES scans(id) ON DELETE CASCADE,
  step         text         NOT NULL,    -- 'detection' | 'analysis' | 'intent_filter' | ...
  status       text         NOT NULL,    -- 'started' | 'complete' | 'errored'
  message      text,                       -- human-readable progress text
  metadata     jsonb,                      -- optional: endpoint counts
  created_at   timestamptz  NOT NULL DEFAULT now()
);
CREATE INDEX idx_scan_events_scan_id_time ON scan_events (scan_id, created_at);

2. The emit helper

def emit_event(session, scan_id, step, status="started",
               message="", metadata=None):
    session.add(ScanEvent(
        scan_id=scan_id, step=step, status=status,
        message=message, metadata=metadata or {},
    ))
    session.flush()

3. The polling endpoint

@router.get("/scans/{scan_id}/events")
def get_events(scan_id: UUID, since: datetime | None = None):
    # Returns events strictly newer than `since`, ascending order.
    # Includes a `cursor` (latest created_at) so the CLI can advance.

CLI spinner

$ recon scan https://walmart.com
✓ detection      complete   2 anti-bot vendors detected
✓ analysis       complete   38 endpoints categorized
✓ intent_filter  complete   Bucket A: 4 · B: 8 · C: 26
⠹ validation     running    Validating 4 Bucket A endpoints (2/4) · 18s
  scrub          pending
  synthesis      pending
  render         pending

Where it lives

Migration
alembic/versions/<hash>_add_scan_events.py
Model
browser_recon_server/models.py::ScanEvent
Helper
browser_recon_server/events.py::emit_event
Endpoint
browser_recon_server/routes/scans.py::get_events
CLI polling
browser_recon/cli/poll.py

Find output in the DB

-- Full event timeline for a scan:
SELECT step, status, message, created_at
FROM scan_events
WHERE scan_id = '<scan_id>'
ORDER BY created_at ASC;

-- Per-step duration:
SELECT step, status, created_at,
       lead(created_at) OVER (PARTITION BY scan_id ORDER BY created_at)
         - created_at AS duration
FROM scan_events
WHERE scan_id = '<scan_id>';

LLM eval matrix.

Re-runs scan synthesis (or any individual prompt) through alternate LLM providers for quality comparison. Persisted to llm_evals with an admin UI matrix.

SERVER · ADMIN T48 + T49 + T50

In plain English

Operator-facing quality tool. Lets the team re-run any AI call (typically the main synthesis) using a different model (GPT-5, Grok, etc.) and compare results side-by-side. Used to answer "could we save money by using a cheaper AI?" — answered by data, not hunches.

What it does

Given an existing scan, re-runs any prompt (typically scan_synthesis or intent_filter) through a different model — Claude, OpenAI GPT, xAI Grok — using the same input the production call saw. Results land in llm_evals as an append-only history. The admin UI at /admin/evals/<scan_id> renders a matrix where rows are prompts and columns are models; clicking any cell opens a side-by-side diff modal against the production response.

Why it exists

Quality validation. Lets the operator answer questions like:

  • Could we switch scan_synthesis from Claude Sonnet to Grok-3-mini and save 80% of the cost? (Answer per the Staples eval: no — cheap models give overconfident wrong recommendations on protected sites.)
  • Does intent_filter hold up on GPT-5.4? (Answer: gpt-5.4-mini is competitive on small sites; over-prunes Bucket B on complex ones.)
  • Are LLM outputs stable across re-runs? (Same model, two runs — append-only history makes this trivially queryable.)

Where it lives

Routes
browser_recon_server/routes/evals.py
Runner
browser_recon_server/eval_runner.py
Lock
browser_recon_server/eval_lock.py — scan-level lock so only one eval runs per scan at a time
Templates
templates/admin_evals_index.html, admin_evals_scan.html, partials/evals_matrix.html
CLI
recon llm-eval --scan-id <id> --provider-b <model>

Database impact

  • Appends to llm_evals on every run (no UNIQUE; re-runs preserve history)
  • S3 keys per response saved at s3_response_path

How to test it

# From the admin UI: navigate to /admin/evals, pick a scan,
# click "Start new eval", pick prompts + model, confirm spend.

# From the CLI (operator only):
recon llm-eval --scan-id 82f42438-... \
               --prompts scan_synthesis,intent_filter \
               --provider-b grok-3-mini \
               --confirm-spend 1.00

Find output in the DB

-- Latest complete eval per (scan, prompt, model):
SELECT DISTINCT ON (scan_id, prompt_name, model)
       prompt_name, model, provider, cost_usd, response_body
FROM llm_evals
WHERE scan_id = '<scan_id>' AND status = 'complete'
ORDER BY scan_id, prompt_name, model, created_at DESC;

The recon CLI.

The customer-facing command-line tool. Installed once via pipx, used by typing recon in a terminal.

CLIENT v0.3.0 = THIN CLIENT (T53)

In plain English

The CLI is the customer's gateway to the tool. It runs on their computer, knows how to launch Chrome, knows how to talk to our server, and shows them the report URL when the scan is done. From v0.3.0 onward it doesn't contain any of our secret sauce — all the smart stuff is server-side.

Install

pipx install browser-recon

# If pipx isn't installed:
python3 -m pip install --user pipx
python3 -m pipx ensurepath
pipx install browser-recon

Commands

CommandWhat it does
recon loginExchange a one-time code for an API key; saves to ~/.recon/config.toml
recon scan <url>Run an interactive scan. Launches Chrome, polls server progress, prints report URL on completion
recon scan <url> --no-waitFire-and-forget; returns the scan_id without polling. For CI use
recon --versionPrints CLI version
recon llm-evalAdmin only — re-run a prompt on alternate providers (requires BROWSER_RECON_ADMIN=1)

What ships in the wheel (v0.3.0)

Only non-proprietary glue. The wheel contains:

  • browser_recon/capture/ — Chrome launcher + CDP listener
  • browser_recon/cli/ — entry point, prompts, login, config
  • browser_recon/client.py — thin httpx wrapper around the server API
  • browser_recon/poll.py — scan-events polling loop + rich spinner

Roughly 2 MB installed. No detection rules, no validation logic, no LLM prompts, no scoring heuristics. Everything proprietary lives server-side.

What ran where (pre-T53 vs post-T53)

ComponentPre-T53 (v0.2.x)Post-T53 (v0.3.0)
Chrome captureClientClient
DetectionClientServer
AnalysisClientServer
ValidationClient (needed proxy env in user shell!)Server
ScrubbingClient (pre-upload)Server (post-validation)
LLM promptsServerServer
Report renderingServerServer

CLI ↔ server protocol

Exactly two endpoints:

POST /scans/<id>/capture       // upload raw blob, returns 202 + status_url
GET  /scans/<id>/events?since   // poll for pipeline progress events

Both authenticated via Authorization: Bearer rec_live_…. TLS 1.2/1.3 in transit (Render terminates).

Config file

~/.recon/config.toml:

[auth]
api_key = "rec_live_..."

[server]
base_url = "https://browser-recon.com"

Source

Repo
github.com/<org>/browser-recon
Entry
browser_recon/cli/main.py::main
Login
browser_recon/cli/login.py
Config
browser_recon/cli/config.py — reads ~/.recon/config.toml
Build
rye build → wheel in dist/
Publish
twine upload dist/* or tag-driven GitHub Actions release

From signup to first scan.

What a brand-new customer does, in order. Five touchpoints: sign up, get an API key, install the CLI, log in, run a scan.

In plain English

This is the new-customer onboarding path. The web side handles signup and billing; the CLI side handles the actual scanning. Customers can also scan entirely from the web dashboard (no CLI install) — but the CLI is recommended because it captures from their real Chrome on their real network.

1 · Signup

Customer visits browser-recon.com. Sees the landing page (templates/landing.html). Clicks "Sign up", enters email + password (no email verification required in MVP — Render manages the form). Server creates a row in users with role='user' and a free-tier credit allocation.

2 · Dashboard login

After signup, redirected to /dashboard/login. Magic-link email flow (templates/dashboard_login.htmldashboard_login_sent.html). User clicks link, lands at /dashboard with a session cookie.

3 · API key creation

From the dashboard, clicks "Create API key" → server generates a rec_live_… token, hashes it into api_keys, shows the plaintext once (templates/dashboard_api_key_created.html). User copies it.

4 · CLI install + login

pipx install browser-recon
recon login
# prompts for API key, validates against server, saves to ~/.recon/config.toml

5 · First scan

recon scan https://example.com

# 1. CLI picks a starter template (products / stocks / etc.)
# 2. CLI asks: "Describe what data you want to scrape:"
# 3. Chrome launches. User browses the site, clicks around, navigates.
# 4. Ctrl+C → flow confirm summary
# 5. CLI uploads capture, polls events, prints live progress.
# 6. When render completes, CLI prints report URL:
#    https://browser-recon.com/r/<slug>

Web UI navigation

PathPurpose
/Landing page
/pricingTier comparison
/dashboardUser home — recent scans, credit balance, quick links
/dashboard/dataAll scans (paginated)
/dashboard/scan/newBrowser-driven scan flow (no CLI required)
/dashboard/api-keysManage API keys
/r/<slug>Public-share report URL
/r/<slug>/downloadSelf-contained HTML download (T52)

Permissions / roles

RoleSees
guestLanding, pricing, public reports
userAbove + their own dashboard + their own scans + their own API keys
adminAbove + /admin/* read-only views
super_adminAbove + all admin write operations, evals, audit logs

Admin operations.

A separate web UI at /admin/* for the team to monitor health, costs, AI quality, and individual scans.

SERVER ADMIN + SUPER_ADMIN ONLY

In plain English

A separate part of the site that customers never see. Used by the operator (you) to: watch how the system is doing overall, dig into individual scans to debug problems, manually re-run AI calls with different models to compare quality, and manage users + roles. Locked behind a role check so only admin / super_admin can see it.

Layout

Full-width Tailwind layout (templates/admin_layout.html) separate from the user-facing dashboard. Sidebar nav, sticky header with role chip, content area on the right.

Pages

PathPurpose
/admin/dashboardKPI strip — total scans, success rate, cost, top users, top domains, prompt reliability, 30-day cost trend
/admin/scansCross-user scan list with filters (user / status / date / domain / include-soft-deleted) and per-scan debug links
/debug/scans/<id>Per-scan deep-dive — Overview / Capture / Buckets / LLM calls / Audit tabs + lazy-loaded S3 dumps + the Evals tab (T48)
/admin/evalsIndex of scans with at least one eval row
/admin/evals/<scan_id>Matrix view: rows = prompts, columns = models. Click any cell for the side-by-side diff modal vs production (T50)
/admin/usersUser list, role management (super_admin only)
/admin/auditAudit log of admin actions

Where it lives

Routes
browser_recon_server/routes/admin.py, routes/evals.py
Templates
templates/admin_*.html + partials
Auth guard
require_role(["admin", "super_admin"]) dependency

Promoting a user to super_admin

Set BROWSER_RECON_SUPER_ADMIN_EMAIL=… in Render env. The next time a user with that email logs in, their users.role auto-promotes. Used for initial setup; subsequent promotions happen via the admin UI's user management page.

Audit trail

Every admin action (role change, scan soft-delete, eval start) appends to audit_log. Find recent admin activity:

SELECT actor_id, action, target_type, target_id, created_at
FROM audit_log
ORDER BY created_at DESC
LIMIT 50;

Concerns, gaps & pending work.

Honest accounting of what this guide doesn't cover, what's incomplete in the system itself, and what's worth doing next. Read this last.

In plain English

This page is the "stuff I'd want a new engineer to know before they start digging" list. Nothing here is broken — it's all known limitations, known dead code, and known next-step ideas.

Documentation gaps

Things this guide could cover but doesn't. Not blockers — just acknowledged gaps if the guide ever becomes the primary onboarding doc.

TopicWhy it's missingSeverity
Local development setupNo page covering rye sync, .env configuration, running migrations locally, starting the FastAPI server on a fresh checkout.Medium
Deployment to RenderNo page on how the app is wired to Render, what env vars need to be set there, how database migrations run on deploy.Medium
Billing / Stripe integrationStripe code exists in the repo (tests for stripe_webhook) but no page documents the billing flow, credit allocation, or webhook handling.Medium
Soft-delete behaviourAdmin UI mentions include-soft-deleted filter; no page explains the soft-delete mechanism.Low
Audit log detailsAdmin UI page references it; no dedicated section explaining what actions are audited or how to query.Low
API-level rate limitingNo documentation on per-customer caps or how the server limits abuse.Low
Error handlingNo page on what happens when a pipeline step fails mid-scan — how the user sees it, how the operator debugs it.Low
ScreenshotsThe guide uses SVG diagrams but no actual screenshots of the admin UI, the dashboard, or the report. Capturing them needs a live scan to draw from.Low
In-page searchThe guide is 19 pages, ~130 KB. A search box would help. Not in v1.Low

Code-level dead weight (post-T53 cutover)

The T53 thin-CLI migration left a few orphan modules. None are broken; they're just unreachable now and worth deleting in a follow-up sweep.

LocationStatusAction
browser_recon/transport/uploader.pyStill carries client funcs (upload_to_server, complete_scan, submit_validation_runs, submit_replay_runs) targeting endpoints the server no longer serves.Delete the dead client funcs; keep _post_json, create_draft, etc. that are still in use.
browser_recon_server/flow_confirm_orchestrator.pySole caller was the deleted run_confirm_pipeline. No grep hits in the live code path.Verify no test imports remain; delete.
browser_recon_server/scan_pipeline.pyKept alive because rerun_orchestrator.py still calls run_filter_pipeline + run_synthesis_pipeline for the POST /scans/{id}/rerun-stage endpoint. The new pipeline_orchestrator.py doesn't replace this flow.Future T55: rewire rerun-stage to use the new orchestrator, then delete scan_pipeline.py.
POST /scans (legacy single-shot endpoint)Still in routes/scans.py. Calls scan_pipeline.run_pipeline. Not called by any current CLI.Decide: deprecate + remove, or keep as a server-only test scaffolding.
Legacy validation: None field in CLI blob dictsStill present in some test fixtures. Harmless — server ignores it.Clean up in the next test-suite sweep.

Architectural caveats (won't change soon)

  • Cookies captured locally are bound to the customer's IP. When validation later fires through a different proxy IP, those cookies are often invalid against the target's anti-bot challenge. Synthesis surfaces this as "cookie warmup required" in the recommendation. There's no clean fix that doesn't involve running Chrome through the proxy too, which would defeat the residential-IP advantage of the customer's network.
  • Rate-limit probing through rotating-residential proxies is structurally noisy. Each probe request hits the target from a different IP, so the target physically cannot rate-limit a single source it never sees twice. The result carries a proxy_rotation_mode: "rotating" flag + caveat string; synthesis is instructed to recommend ≥1.5 s delay regardless of measured value.
  • The capture upload is unscrubbed. Validation needs the real cookies + auth headers. Scrubbing happens server-side, after validation, before persistence. Secrets live in server RAM transiently. Trade-off accepted in T53 planning.
  • Pre-T54.5: AWS Secrets Manager not yet in use. Proxy credentials still live in Render env vars. Migration to AWS Secrets is a planned follow-up.
  • Pre-T54.5: PyPI publishing setup not yet done. The wheel builds cleanly (dist/browser_recon-0.3.0-py3-none-any.whl, 128 KB) but hasn't been uploaded to PyPI yet. Customers can't pipx install browser-recon until that step.

Pending tasks (next-up)

TaskEffortWhy
AWS Secrets Manager for proxy creds~half dayOperator-side hygiene — rotation, audit, IAM-gated access. Code reader stays in server.
PyPI publish v0.3.0~half day (mostly account setup)Customers can pipx install browser-recon. Currently they'd install from source.
Rewire rerun-stage to use the new orchestrator~1 dayLets us delete scan_pipeline.py. Cleaner mental model — one pipeline, one orchestrator.
Delete dead client funcs in transport/uploader.py~hourWheel shrinks further; reduces confusion for new contributors.
Tighten error-handling UX~half dayThe CLI's poll loop handles a few error cases; broader coverage (proxy auth failure, AWS Secrets unavailable, partial validation) needs explicit handling.
A/B the eval recommendation on more sites~quarter day per siteThe T48 eval matrix has data from Staples + Walmart. More variety strengthens the "Sonnet for synthesis, Grok-3-mini for everything else" claim.
Stripe / billing documentation page~half dayFilling the doc gap above.

Notable wins shipped recently

  • T51 (validation redesign): validation wall-clock per Bucket A endpoint went from ~150 s to ~25 s. The new pipeline is two-axis (library × proxy) cascade with parallelism, bandwidth tracking, and a persisted JSON shape rich enough to drive real cost projections.
  • T51.8 (LLM payload compaction): scan_synthesis input tokens dropped 65–81% (Walmart: 81k → 29k, Staples: 84k → 16k). Synthesis cost dropped 52–69%.
  • T52 (report polish): theme-consistent starter code + Download Report button (self-contained HTML).
  • T52.1 (snippet comment trim): curl + httpx + starter code now ship minimal comments.
  • T53 (thin-CLI cutover): CLI wheel went from ~20 MB to 128 KB. No proprietary code on customer machines. Operator secrets never travel to the customer's shell. Customers see a live progress spinner instead of silent 90-second waits.

Doc revision history

This guide is a living document. Each major feature ships an update.

DateAuthorChanges
2026-05-13Lazy CoderInitial T54 ship — 19 pages, plain-English callouts, glossary.
2026-05-13Lazy CoderT53 implementation complete; page 20 (this one) added.