browser-recon helps you figure out how to scrape a website.
Tell it what you want to extract, browse the site by hand for a couple of minutes, and it produces a complete plan — the right Python library to use, which headers to send, which cookies you'll need, how fast you can hit the site, and a working starter script.
In plain English
Scraping a website looks easy in tutorials but breaks in production because real sites have anti-bot defences (Cloudflare, Akamai, PerimeterX) and dozens of hidden requirements (specific headers, cookies, throttling) the developer can't see. browser-recon watches you use the site in a real browser, then tells you exactly what your scraper needs to do — without you having to guess.
What it does, step by step
- You run a command:
recon scan https://example.com - Chrome launches on your machine and the tool starts watching every request the browser makes.
- You browse the site normally — click around, do a search, view a product, whatever you'd want to scrape. Takes a couple of minutes.
- You press Ctrl+C and the tool uploads everything it saw to its server.
- The server figures out everything — what anti-bot system is in front of the site, which requests are actually load-bearing, which library works (it tries them all), and what'll cost you to run a scraper at scale.
- You get a report URL. Open it in your browser — you have a complete scraping plan.
What it's not
It's not a scraper itself. It produces the plan for a scraper — library choice, headers, cookies, delay between requests, plus a starter Python script. You (or your AI assistant) then write the actual scraper using that plan.
Two halves of the system
The CLI (client)
Installed via pipx install browser-recon. Launches Chrome, monitors network traffic, prompts for user intent, uploads the capture to the server, polls for progress, opens the final report. No proprietary logic.
The server (browser_recon_server)
Receives uploaded captures, runs the full processing pipeline (detection → analysis → intent filter → validation → scrub → synthesis → render), persists everything to Postgres + S3, serves the rendered HTML report and the admin dashboard.
ArchitectureWhy it works better than guessing
Most scrapers fail in production because the developer guessed wrong about three things: which anti-bot system is in front of the site, which headers the request actually needs, and whether the scraper's IP address looks "real enough" to the target. browser-recon answers all three by measuring, not guessing. It actually fires test requests through real proxies and watches which combinations succeed — then writes the recommendation based on what worked.
Glossary
Terms used throughout this guide. Read this once and you'll understand every page.
| Term | In plain English |
|---|---|
| Scan | One full run of the tool — from typing recon scan <url> to opening the final report. |
| Capture | Everything the browser saw during your session: every request it made, every response it got, every cookie, every header. Like a recording. |
| Endpoint | A specific URL the site uses — e.g. /api/products/123. A typical site has 30–80 unique endpoints during a normal browse session. |
| Bucket A / B / C | The tool sorts endpoints into three groups. A = the data you actually want (product details, search results). B = prerequisite calls (session bootstrap, anti-bot challenge). C = noise (analytics, ad tracking, telemetry). |
| Anti-bot | Software running in front of a website (Cloudflare, Akamai, PerimeterX) that tries to detect and block scrapers. Different vendors use different tricks; the tool identifies which one is in play. |
| Proxy | A relay server that forwards your requests through a different IP address. Two flavours: datacenter (cheap, $2/GB, IPs are obviously cloud-hosted) and residential (expensive, $3/GB, IPs look like a real home internet user). |
| Validation | The tool actively tests the proposed scraping approach by firing real requests through real proxies. Confirms what actually works before recommending it. |
| Synthesis | The final AI call that writes up the recommendation, the verdict, and the starter Python script. Uses Claude Sonnet. |
| LLM | "Large Language Model" — the AI that does the judgement-heavy parts (sorting endpoints, writing the recommendation). The tool uses Claude Sonnet for the important calls and Grok-3-mini for the cheap ones. |
| CLI | "Command-Line Interface" — the recon command the customer types in their terminal. |
| Server | The browser-recon backend running on Render. Does all the processing. Customers never see it directly except through the report URL. |
Who this guide is for
This guide is intentionally accessible. If you're a developer, every component page has full technical detail (file paths, code, SQL). If you're a non-technical reader (founder, designer, support engineer, investor), every page starts with a "In plain English" callout that explains the component without jargon. Skim those callouts and you'll know what every part of the system does.
Current state vs planned
Heads up
This guide describes the system as it will be after T53 ships (the "thin-CLI" architecture, v0.3.0). A few pieces are still in flight:
- Detection / Analysis / Validation / Scrubber are described as server-side. Pre-T53 they actually run on the CLI. The file paths shown (e.g.
browser_recon_server/validation_server/) are post-T53; pre-T53 paths arebrowser_recon/validation/. - Proxy credentials are described as living in AWS Secrets Manager. Pre-T53 they live in Render env vars.
- scan_events polling + live CLI spinner are described as working. Pre-T53.1 the CLI just sits silent for 60–120 s.
- PyPI distribution as v0.3.0 hasn't shipped yet — pre-T53.8 the CLI is installed from source.
Each affected page calls out the transition where relevant. T53 spec is at documentation/T53-thin-cli-server-side-pipeline.html.
Components at a glance.
Eleven moving parts make up the system. Most run on the server; only the browser-capture piece runs on the user's machine.
In plain English
Think of a scan as an assembly line. The CLI captures the browser session and hands it off; then on the server, the work moves through 8 stations — detect what's protecting the site, organise the requests, sort them by usefulness, actually test what works, remove secrets, write the recommendation, write the auxiliary notes, render the final report. Each station has its own page below.
Pipeline components
Listed in the order they fire during a scan. Click any card to jump to its detail page.
Browser capture
Launches Chrome, monitors network traffic via CDP. Captures every request/response/cookie/header during the user's browse session. The only piece that must run locally.
DetailDetection
Fingerprints the anti-bot stack from the capture. Matches cookies (_abck → Akamai, __cf_bm → Cloudflare) and headers (server: cloudflare) against a rules table. No LLM.
Analysis
Templates URLs (/api/users/123 → /api/users/<id>), extracts response shapes, infers dependency chains and pagination patterns. No LLM.
Intent filter
Classifies endpoints into Bucket A (the data), Bucket B (prerequisites), Bucket C (noise) based on user intent text. First LLM call in the pipeline.
DetailValidation
Actively fires test requests against the target site through operator proxies. Library × proxy cascade, header reduction, cookie scenarios, rate-limit probe. The T51 work.
DetailScrubber
Strips secrets (cookie values, Authorization tokens, JWT-shaped strings) from the in-memory capture before persistence + downstream LLM calls.
Synthesis
One combined call producing the recommendation, the verdict, and the starter Python script. The most expensive LLM call (~$0.10–$0.15 per scan).
DetailAuxiliary prompts
Two cheap parallel calls — notes (5 short pointers) + difficulty_drivers (ranked obstacles). Runs simultaneously with synthesis.
Report renderer
Reconstructs typed objects from the persisted JSONB and renders the final HTML report through Jinja templates. No LLM, no DB writes besides finalization.
DetailScan events
Append-only event log of pipeline step transitions, polled by the CLI every 1.5s for the live progress spinner. Replaces silent waiting.
DetailEval system
Re-runs scan synthesis through alternate LLM providers (OpenAI, xAI) for quality comparison. Persisted to llm_evals with a matrix UI.
Cost breakdown per scan
| Step | Provider | Avg cost | Notes |
|---|---|---|---|
| intent_filter | Claude Sonnet 4.6 | $0.030 | Judgement-heavy; cheap models over-prune Bucket B. |
| scan_synthesis | Claude Sonnet 4.6 | $0.100 | Combined recommendation + verdict + starter_code in one call. |
| notes | Grok-3-mini | $0.004 | Parallel. |
| difficulty_drivers | Grok-3-mini | $0.004 | Parallel. |
| intent_clarifier | Grok-3-mini | $0.0003 | Only runs if intent is ambiguous. |
| flow_confirm | Grok-3-mini | $0.003 | Summarises the captured action sequence. |
| Total | — | ~$0.14 | Down from $0.38 pre-T51.8. |
Architectural map.
Three actors: the customer's computer, the browser-recon server (hosted on Render), and AWS for storage and secrets. The server does most of the work.
In plain English
The customer's CLI is a lightweight thing that knows how to drive Chrome and talk to a server. The server is where all the smart stuff lives — the AI models, the validation logic, the report generator. AWS sits next to the server holding two things: raw captures (in S3) and proxy credentials (in Secrets Manager).
Boundaries that matter
- Client ↔ server: exactly two endpoints —
POST /scans/<id>/captureandGET /scans/<id>/events. All traffic is HTTPS bearer-authed. - Server ↔ Postgres: single managed instance on Render. All persistent state lives here: scans, reports, llm_calls, llm_evals, validation_runs, scan_events, users, audit_log.
- Server ↔ S3: raw captures (KMS-encrypted) and prompt/response dumps for every LLM call (for replay + audit).
- Server ↔ Secrets Manager: proxy URLs only. Never logged, never persisted to Postgres.
- Server ↔ external LLMs: Anthropic, OpenAI, xAI. Each has its own API key in env (server-side only).
- Server ↔ proxy vendor: all validation HTTP traffic exits through the configured proxy URLs.
Deployment model
Single Render Web Service running uvicorn browser_recon_server.main:app. Auto-scales based on request rate. Postgres is Render-managed (encrypted disk by default). S3 + Secrets Manager are AWS-side. No worker queue today — long-running pipeline tasks use FastAPI's BackgroundTasks within the same process. If we need to fan out under load, the natural next step is a Celery/RQ worker reading from a Redis queue.
Lifecycle of a scan.
Fourteen ordered steps, from recon scan in the terminal to a report URL in the browser. Steps 1–5 happen on the user's machine; 6–14 happen on the server.
In plain English
Read this page top to bottom and you'll know exactly what happens during a scan — when the browser opens, when the AI runs, when the report appears, and what each stage produces. Each step is one or two sentences.
POST /scans. Server runs intent_clarifier (Grok-3-mini) to decide if clarifying questions are needed; returns the refined intent.! ANTI-BOT: lines.flow_confirm (Grok-3-mini) which summarises it. User confirms or refines.POST /scans/<id>/capture. Server stores the raw blob in S3, returns 202 + status_url. Cookies and Authorization headers are not yet scrubbed — validation needs them.complete. The CLI's polling loop sees render: complete, prints the report URL, exits.Status transitions
The scans.status column tracks the scan's lifecycle:
| Status | When | Next |
|---|---|---|
draft | Right after POST /scans (intent captured, browser not yet launched) | capturing |
capturing | Chrome running, CDP listening | processing |
processing | Capture uploaded; pipeline running server-side | complete | errored |
complete | Report rendered, scan finalized | (terminal) |
errored | Pipeline failed; error_message populated | (terminal) |
Database schema.
Eight tables in Postgres hold every piece of persistent state — users, scans, AI calls, validation results, audit logs.
In plain English
Every scan, every AI call, every test request leaves a row in the database so we can debug later. Most data is stored as JSON inside a column rather than a normalised set of columns — that lets us evolve the pipeline without constantly migrating the schema. This page is the most technical in the guide; non-tech readers can safely skim the table summary at the top and skip the SQL.
Core tables
| Table | Purpose | Append-only? | Key columns |
|---|---|---|---|
users | Customer accounts + role | No (UPDATE on login) | id, email, role, created_at |
api_keys | rec_live_… bearer tokens | Yes | id, user_id, key_hash, last_used_at, revoked_at |
scans | One row per scan. Stores capture-derived JSONB + final pipeline outputs. | No | id, user_id, target_url, status, intent_text, bucket_assignment, synthesis, completed_at |
reports | Renderable report row, joined to scans. Carries pre-rendered fields the dashboard query needs. | No | id, scan_id, slug, verdict, recommendation, created_at |
llm_calls | One row per LLM API call (Anthropic / OpenAI / xAI). For audit + cost rollup + debug. | Yes | id, scan_id, prompt_name, model, provider, cost_usd, input_tokens, output_tokens, s3_prompt_path, s3_response_path |
llm_evals | T48 — eval runs (re-running synthesis with alternate providers). Append-only history of every re-run. | Yes | id, scan_id, prompt_name, model, provider, status, response_body, cost_usd |
validation_runs | T51.7 — one row per (scan_id, endpoint_id) per validation run. Carries validation_data + slim llm_view + report_view. | Yes | id, scan_id, endpoint_id, status, validation_data, llm_view, report_view |
scan_events | T53.1 — pipeline step events for the CLI polling loop. | Yes | id, scan_id, step, status, message, created_at |
audit_log | Admin actions, role changes, soft-deletes | Yes | id, actor_id, action, target_type, target_id, created_at |
JSONB columns on scans
Most of the scan's processed state lives in JSONB columns on the scans row, rather than in normalised tables. This keeps the schema flexible as the pipeline evolves.
| Column | Set by | Shape |
|---|---|---|
capture_metadata | Step 6 (upload) | {captured_at, duration_ms, request_count, ...} |
findings | Step 7 (detection) | [{kind, vendor, severity_tier, evidence: [...]}] |
analysis_output | Step 8 (analysis) | {endpoints: [...], framework_hints, flow_timeline, cors_summary, ...} |
bucket_assignment | Step 9 (intent filter) | {bucket_a: [ep_ids], bucket_b: [...], bucket_c: [...], rationale} |
validation | Step 10 (validation) | {per_endpoint: [...], per_endpoint_llm_view, per_endpoint_report_view, scan_validation_summary} |
synthesis | Step 12 (synthesis) | {recommendation: {...}, verdict: {...}, starter_code: {...}} |
notes | Step 13a | {notes: [str, ...]} |
difficulty_drivers | Step 13b | {drivers: [{label, severity, explanation, evidence_refs}]} |
Querying conventions
-- Find recent failed scans: SELECT id, target_url, status, error_message, created_at FROM scans WHERE status = 'errored' ORDER BY created_at DESC LIMIT 20; -- Find the latest validation_runs row per endpoint (T51.7 append-only): SELECT DISTINCT ON (scan_id, endpoint_id) * FROM validation_runs WHERE scan_id = $1 AND status = 'complete' ORDER BY scan_id, endpoint_id, created_at DESC; -- Find a scan's event timeline (T53.1): SELECT step, status, message, created_at FROM scan_events WHERE scan_id = $1 ORDER BY created_at ASC; -- LLM cost rollup per scan: SELECT scan_id, SUM(cost_usd) AS total_usd FROM llm_calls WHERE created_at > now() - interval '7 days' GROUP BY scan_id ORDER BY total_usd DESC;
Migrations
Run via Alembic. Every schema change ships a migration in alembic/versions/. To apply on a new environment: rye run alembic upgrade head. To downgrade one: rye run alembic downgrade -1.
Recent migrations:
f7a3c2d4e5b6_t51_7_add_validation_runs.py— append-only validation_runs tablec1d2e3f4a5b6_add_llm_evals.py— T48 eval persistence- (T53.1)
add_scan_events.py— pipeline event log
Browser capture.
Launches Chrome on the customer's machine and records every network call via the Chrome DevTools Protocol. The only piece that must run client-side.
In plain English
Opens a real Chrome window on the user's computer and records every single network request the browser makes while the user clicks around. Like a wire-tap on Chrome, but only for the duration of the scan.
What it does
Launches Chrome with --remote-debugging-port=9222 on a fresh, isolated user-data-dir (no extensions, no profile pollution). Connects to the CDP WebSocket, enables Network/Page/Runtime domains, and starts logging every Network.requestWillBeSent + Network.responseReceived + Network.dataReceived event into an in-memory blob. Captures cookies via Network.getAllCookies on capture end.
Real-time anti-bot detection
While capturing, the CLI runs a light detection pass on incoming responses and prints live alerts:
! ANTI-BOT: Imperva / Incapsula detected -- Header 'x-cdn' present ! ANTI-BOT: PerimeterX detected -- URL contains 'px-cloud.net' ! ANTI-BOT: Akamai Bot Manager detected -- URL contains '/akam/'
This is informational only — the authoritative detection step runs server-side after upload. The CLI version gives the user instant feedback so they understand what they're scraping.
Where it lives
Inputs / outputs
- Input: target URL, intent text, optional starter template.
- Output: a capture blob — list of requests/responses with full headers, response bodies, cookies, timestamps, plus user-action breadcrumbs (clicks, inputs, form submits).
Database impact
The CLI does not write to the database directly. The capture blob is uploaded to the server in step 6, which stores it in S3 (KMS-encrypted) and creates a row in scans with the S3 key.
How to test it
# Run the CLI against a public test site: recon scan https://books.toscrape.com # Capture lasts as long as you browse. Ctrl+C to stop. # Output dir gets a recon_capture.json with the full blob. # Unit tests for the capture layer: rye run pytest tests/unit/test_chrome_launcher.py rye run pytest tests/unit/test_capture_monitor.py
Find output in the DB after upload
SELECT id, status, target_url, capture_metadata FROM scans WHERE id = '<scan_id>'; -- The raw capture is at the S3 key returned in the response: SELECT s3_capture_path FROM scans WHERE id = '<scan_id>';
Known gotchas
- Chrome must be installed on the customer's machine. The launcher searches the standard install paths per OS; if missing, the CLI fails fast with an install hint.
- Cookies captured during the browse session are bound to the customer's residential IP. When validation later fires through a different proxy IP, those cookies may be invalid (PerimeterX/Akamai bind
_pxhd/_abckto IP). - Capture cookies aren't scrubbed on the CLI side post-T53 — they travel to the server unscrubbed because validation needs them. They're scrubbed server-side after step 10.
Anti-bot detection.
Fingerprints which anti-bot vendors are protecting the target site. Pure pattern-matching, no LLM. Runs in tens of milliseconds.
In plain English
Figures out which anti-scraping system (Cloudflare, Akamai, PerimeterX, DataDome, etc.) is in front of the target site by looking at signature cookies and headers — like a fingerprint scanner for bot-protection vendors. No AI involved; just pattern matching.
What it does
Walks the captured requests and responses, scoring them against a rules table:
- Cookies —
_abck→ Akamai Bot Manager ·_pxhd/_pxvid→ PerimeterX ·__cf_bm/cf_clearance→ Cloudflare ·datadome→ DataDome ·incap_ses→ Imperva ·visid_incap→ Imperva. - Response headers —
server: cloudflare,x-cdn: incapsula,x-akamai-transformed,cf-ray,x-datadome. - URL patterns — paths under
/akam/, hosts ending in.px-cloud.net,.datadome.co,recaptcha/api2/. - JavaScript fingerprints — script tags loaded from known anti-bot CDNs.
Each match contributes evidence with a confidence weight. Findings are aggregated per vendor and emitted with a severity tier (high/medium/low) reflecting how restrictive that vendor typically is in production scraping.
Where it lives
Output shape
{
"findings": [
{
"kind": "anti_bot",
"vendor": "Akamai Bot Manager",
"severity_tier": "high",
"severity_label": "high",
"confidence": 0.85,
"evidence": [
{"signal": "cookie", "value": "_abck", "confidence_weight": 0.3, ...},
{"signal": "url_pattern", "value": "path: /akam/", ...}
],
"per_endpoint": {"https://...": "presence detected", ...}
}
]
}
Database impact
- Writes to
scans.findings(JSONB column) - Emits a
scan_eventsrow at start + complete (T53.1)
How to test it
rye run pytest tests/unit/test_detection_anti_bot.py rye run pytest tests/unit/test_detection_runner.py
Find output in the DB
-- All findings for a scan, sorted by severity: SELECT jsonb_path_query(findings, '$[*]') AS finding FROM scans WHERE id = '<scan_id>'; -- Count scans by primary anti-bot vendor: SELECT jsonb_path_query_first(findings, '$[*] ? (@.kind == "anti_bot") .vendor') AS vendor, count(*) FROM scans WHERE status = 'complete' GROUP BY vendor;
Adding a new detection rule
- Open
detection/rules/anti_bot.py(or the appropriate kind file). - Add a
Rule(...)entry with vendor name, signal type, pattern, confidence weight. - Add a unit test in
tests/unit/test_detection_anti_bot.pyusing a captured sample of the signal. - Run the test suite to confirm no regression in scoring on existing scans.
Endpoint analysis.
Structures the raw capture into a typed endpoint inventory with templated URLs, response shapes, dependency edges, pagination patterns, and auth signals.
In plain English
Takes the raw captured requests and organises them into a tidy list. Spots that /api/users/123 and /api/users/456 are the same endpoint with different IDs. Notes which calls hand off data to other calls (like a search returning IDs that a detail call then fetches). No AI involved.
What it does
Five sub-passes, all deterministic:
- URL templating — collapses
/api/users/12345and/api/users/67890into/api/users/<id>with two example values. Recognises numeric ids, UUIDs, long hex slugs. - Response shape extraction — for each endpoint, summarises the JSON shape (key list, array depth, response_size in bytes).
- Dependency chain inference — scans request bodies for values that appeared in earlier response bodies. If endpoint B's request contains a token A's response surfaced, record edge
A → B. - Pagination detection — recognises
?page=N,?cursor=…,?offset=N&limit=Mpatterns across URL families. - Auth signal extraction — flags endpoints that carry
Authorization, require cookies, or send/receive CSRF tokens.
Where it lives
Output shape
{
"endpoints": [
{
"id": "ep_017",
"url_template": "/api/products/<id>",
"method": "GET",
"hostname": "example.com",
"observed_count": 5,
"response_shape": {"shape": "object{name,price,sku}", "size_bytes": 54000},
"required_cookies": ["_abck", "session_id"],
"required_header_values": {"Referer": "https://example.com/products"},
...
}
],
"framework_hints": [{"name": "Next.js", "evidence": "_next/api path"}],
"flow_timeline": [{"step_type": "click", "target": "button#search", ...}],
"cors_summary": [...]
}
Database impact
- Writes to
scans.analysis_output(JSONB) - The
endpointslist is the universe Bucket A/B/C classification operates on in step 9 - Emits start + complete events in
scan_events
How to test it
rye run pytest tests/unit/test_analysis_runner.py rye run pytest tests/unit/test_url_template.py rye run pytest tests/unit/test_bucket_signals.py
Find output in the DB
-- Endpoint inventory for a scan: SELECT jsonb_array_length(analysis_output->'endpoints') AS n_endpoints FROM scans WHERE id = '<scan_id>'; -- Endpoints with required cookies: SELECT ep FROM scans, jsonb_array_elements(analysis_output->'endpoints') AS ep WHERE id = '<scan_id>' AND jsonb_array_length(ep->'required_cookies') > 0;
Intent filter.
First LLM call. Classifies captured endpoints into Bucket A (the data), Bucket B (prerequisites), Bucket C (noise) based on the user's typed intent.
In plain English
A real website makes dozens of network calls during a browse session — most are tracking pixels, analytics, ad-tech junk. This step asks an AI to read the user's goal ("I want product data and reviews") and sort the captured calls into three piles: useful data, setup calls we need first, and noise we can ignore.
What it does
Most captures contain 30–80 endpoints. Most are noise — analytics pixels, telemetry beacons, third-party tag managers. The intent filter sends the user's intent text + the endpoint inventory to Claude Sonnet with a structured-output prompt, and gets back three bucketed lists:
- Bucket A — the data the user actually wants (product detail, search results, reviews).
- Bucket B — prerequisites the scraper must hit first (session bootstrap, config endpoints, anti-bot init).
- Bucket C — pure noise (Adobe Analytics pixels, Bloomreach, Criteo retargeting).
Why Sonnet
Judgement-heavy task. A getReviews endpoint might look like noise to a cheap model but it's load-bearing when the user said "scrape reviews." Cheaper models also over-prune Bucket B (the T48 eval showed Grok-3-mini and gpt-5.4-nano dropping checkout/payment session endpoints scrapers actually need). The cost overhead for Sonnet over Grok-3-mini is ~$0.025/scan — minor compared to the cost of shipping a broken scraper.
Where it lives
Output shape
{
"bucket_a": ["ep_012", "ep_019", "ep_020", ...],
"bucket_b": ["ep_003", "ep_004", "ep_005", ...],
"bucket_c": ["ep_001", "ep_002", ...],
"rationale": "Bucket A contains the core data endpoints...",
"confidence_entries": [{"endpoint_id": "ep_012", "confidence": 0.92}, ...]
}
Database impact
- Writes to
scans.bucket_assignment(JSONB) - One row in
llm_callswith cost, tokens, S3 prompt+response keys - Events emitted in
scan_events
How to test it
rye run pytest tests/unit/test_intent_filter_prompt.py
# Re-run on an existing scan (admin only):
recon llm-eval --scan-id <scan_id> --prompts intent_filter --provider-b claude-sonnet-4-6 --force
Find output in the DB
-- Bucket assignment for a scan: SELECT bucket_assignment FROM scans WHERE id = '<scan_id>'; -- Find the LLM call row: SELECT model, cost_usd, input_tokens, output_tokens, s3_response_path FROM llm_calls WHERE scan_id = '<scan_id>' AND prompt_name = 'intent_filter';
Active validation.
The T51 work. Fires real HTTP requests through operator-owned proxies to confirm which library + proxy combo actually works for each Bucket A endpoint.
In plain English
The most important step. Instead of guessing whether the scraper will work, we actually try it. For each useful endpoint, we fire test requests using different Python libraries (requests, httpx, curl_cffi, cloudscraper) through different proxy types (datacenter and residential), and see which combination gets a real response. We also figure out the minimum set of headers needed and how fast we can hit the site before it gets angry. The result is grounded in reality, not in what the AI guessed.
What it does
For each Bucket A endpoint, walks a four-step cascade:
- Library comparison — fans out 3 libraries (
requests,httpx,curl_cffi) × 2 proxy tiers (datacenter, residential) in one parallelThreadPoolExecutorwave. If all 6 attempts block, falls back tocloudscraperas Phase B. Picks the winner by preference order (simplest library that works). - Header reduction — tier list skips
Accept-Encoding,sec-fetch-*,sec-ch-ua-*without testing; remaining headers tested in parallel waves with a 2-axis (library × proxy) cascade on failures, capped at 6 attempts per header. - Cookie dependency — three scenarios (
cold/warmup/full) fired in parallel with the same cascade for failures. - Rate-limit probe — 3 rounds × 5 requests at delays
[2.0s, 0.5s, 0.2s]with the winning library/proxy. Tags result withproxy_rotation_modeso synthesis treats rotating-residential measurements as a lower bound.
For each Bucket B endpoint: just one verification request with the scan-level winner. Mini fan-out only on failure (no header/cookie/rate-limit probing on B).
Where it lives
Proxy credentials
Read from AWS Secrets Manager (post-T53.5) or server env vars (pre-T53). Never from customer's environment. Customer's shell stays clean of operator secrets.
Bandwidth tracking
Every HTTP attempt records request_bytes + response_bytes. Rolled up to a bandwidth_summary per endpoint and used to compute per_1k_requests_usd from the configured per-tier $/GB rate:
cost_per_1k = (avg_req_bytes + avg_resp_bytes) * 1000 / 1e9 * rate_usd_per_gb # Datacenter: $2.00/GB · Residential: $3.00/GB
Output shape
Three projections per endpoint, all persisted:
validation_data— full attempt log, source of truth. Every library × proxy attempt, every header probe, every rate-limit round.llm_view— ~250 token compact projection for the synthesis prompt (was ~4,600 tokens pre-T51.8).report_view— medium-detail projection for the Jinja Validation section.
Database impact
- Writes one row per endpoint to
validation_runs(append-only — re-runs insert new rows) - Writes to
scans.validationfor the orchestrator's typed reconstruction - Emits
scan_eventsrows during long validation passes (e.g. "Validating 2/4 Bucket A endpoints")
How to test it
rye run pytest tests/unit/test_library_compare.py rye run pytest tests/unit/test_header_reduce.py rye run pytest tests/unit/test_cookie_dependency.py rye run pytest tests/unit/test_rate_limit_probe.py rye run pytest tests/unit/test_validation_orchestrator.py rye run pytest tests/integration/test_two_round_validation.py
Find output in the DB
-- All validation rows for a scan: SELECT endpoint_id, status, duration_ms, validation_data->>'best_library' AS best_lib, validation_data->'bandwidth_summary'->>'per_1k_requests_usd' AS cost FROM validation_runs WHERE scan_id = '<scan_id>' ORDER BY created_at; -- LLM-view projection (the slim shape synthesis sees): SELECT llm_view FROM validation_runs WHERE scan_id = '<scan_id>' AND endpoint_id = 'ep_017';
Known limits
- Cookies captured locally are bound to the customer's residential IP; validation through a different proxy may trigger 412 challenges (we see this on PerimeterX-protected sites). The recommendation surfaces "cookie warmup required" when this is the case.
- Rate-limit probe through rotating-residential proxies fires from a different IP per request, so the target physically cannot rate-limit. The result carries
proxy_rotation_mode: "rotating"and a caveat string; the synthesis prompt is instructed to recommend ≥1.5 s delay regardless of measured value. - Spot-check semantics: ~100 requests over ~30 s ≠ 50,000 requests/hour from a sticky IP. The report copy says "spot-check, not production-scale."
Secret scrubber.
Removes confidential values from the in-memory capture after validation, before persistence and downstream LLM calls. Secrets only ever live in server RAM transiently.
In plain English
Removes the user's secrets (cookie values, login tokens) from the captured data before saving it long-term or feeding it to the AI. Keeps the names ("you'll need an _abck cookie") so the recommendation is still useful, but replaces the actual values with <scrubbed> placeholders. Secrets only live in server memory for the few seconds validation needs them.
What it does
| Field | Treatment |
|---|---|
Cookie: header value | Keep cookie names, replace values with <scrubbed> |
Set-Cookie response header | Keep name + attributes (path, samesite), strip value |
Authorization: Bearer … | Replace value with <scrubbed> |
X-Api-Key, X-Auth-Token, custom auth-shaped headers | Same |
| Request bodies containing tokens (regex / known patterns) | Replace matched values with <scrubbed> |
| Response bodies containing JWT-shaped strings | Same |
What stays
- Cookie / header names (the synthesis prompt needs to tell the user "your scraper will need a
_abckcookie") - Non-sensitive headers (
Accept-Language,Referer,Content-Type) - Bucket A response shapes (so synthesis can write
data = r.json()["products"]) - URL paths + query parameters (rare for secrets to live there; if regex finds tokens we scrub)
Where it lives
When it runs
Step 11 of the pipeline — after validation, before synthesis. This ordering is critical: validation needs the real cookies to test cookie-warmup scenarios; synthesis only needs names + shapes. The S3 capture blob is rewritten in-place after scrubbing so secrets never persist long-term.
Database impact
- Rewrites the capture blob at the same S3 key (KMS-encrypted; old version is the unscrubbed pre-validation copy)
- Sets
scans.scrubbed_attimestamp - All subsequent JSONB writes (synthesis input, report_view, llm_view) contain only scrubbed data
How to test it
rye run pytest tests/unit/test_scrubber.py rye run pytest tests/unit/test_scrubber_patterns.py
Confirm scrubbing worked
# Pull the persisted capture from S3 and grep for secret patterns: aws s3 cp s3://browser-recon-prod/captures/<scan_id>.json.gz - \ | gunzip \ | grep -E '"value": "(eyJ|Bearer |[a-f0-9]{32})' # Should return nothing for a scrubbed capture.
Scan synthesis.
The main LLM call. Produces the recommendation, the verdict, and the runnable starter script in one combined Claude Sonnet 4.6 invocation.
In plain English
The final, headline AI call. Takes everything we've learned (detection + organisation + validation results) and writes three things in one go: the recommendation ("use curl_cffi with chrome120 impersonation through residential proxies"), a 2-sentence verdict ("Walmart is protected by Akamai + PerimeterX, expect $0.40–$2.00 per 1k requests"), and a working starter script the user can run as-is.
What it does
One scan_synthesis call (combined prompt from T16) that gets fed:
- The user's intent + Bucket A/B endpoint inventory (filtered, scrubbed)
- Detection findings (anti-bot vendors with severity tier)
- The compact
validationslot from step 10 (library_matrix, best library/proxy, min_required_headers, cookie scenarios, rate-limit, bandwidth cost) - Target identity (URL, domain)
Returns three structured sub-outputs in a single response:
recommendation—{primary_library, impersonation, proxy_type, confidence, cost_band, rationale}. The actionable verdict.verdict— 2–3 sentence plain-English headline at the top of the report.starter_code— runnable Python script that uses the recommended library + headers + cookies + rate-limit delay from validation. T52.1's "minimum comments" rule keeps it terse.
Why combined into one call
Before T16 this was 3 separate prompts (recommendation, verdict, starter_code) that each re-sent the same ~80k of input. Combining saved roughly two-thirds of the input tokens. T51.8 then cut another ~80% by replacing the bloated validation_results_* payload with the slim llm_view.
Where it lives
Database impact
- Writes to
scans.synthesis(JSONB, holds all three sub-outputs) - Writes
scans.primary_recommendation,cost_band_low_usd,cost_band_high_usd,confidenceat the top level for dashboard queries - One
llm_callsrow with cost + S3 prompt+response keys
How to test it
rye run pytest tests/unit/test_scan_synthesis_prompt.py
rye run pytest tests/unit/test_synthesis_orchestrator.py
# Re-run on an existing scan via the eval system:
recon llm-eval --scan-id <scan_id> --prompts scan_synthesis --provider-b claude-sonnet-4-6 --force
Find output in the DB
-- Top-level recommendation fields: SELECT primary_recommendation, cost_band_low_usd, cost_band_high_usd, confidence FROM scans WHERE id = '<scan_id>'; -- Full synthesis output: SELECT synthesis FROM scans WHERE id = '<scan_id>'; -- Starter code only: SELECT synthesis->'starter_code'->>'code' FROM scans WHERE id = '<scan_id>';
Notes & difficulty drivers.
Two cheap parallel Grok-3-mini calls that produce report sections orthogonal to the recommendation.
In plain English
Two small AI calls that add colour to the report. Notes = "5 things the developer should know that aren't in the recommendation" (e.g. "the API uses GraphQL — you can introspect the schema"). Difficulty drivers = "ranked list of what makes this site hard to scrape." Both run on a cheap fast AI in parallel with the main synthesis call.
What they do
notes— 5 short pointers the scraper engineer should know but the recommendation doesn't surface (e.g. "GraphQL introspection appears enabled", "review pagination uses?page=N", "this endpoint requiresX-APOLLO-OPERATION-NAMEheader"). Renders as bullets in section 02 of the report.difficulty_drivers— ranked list of what makes this site hard to scrape. Each driver has a severity tier (easy/medium/high) and a one-sentence explanation. Drives section 04 of the report.
Parallelism
Run via synthesis_orchestrator.py's thread pool, fanning out simultaneously with scan_synthesis. Combined wall-clock is ~max of the slowest, not sum.
Where they live
Database impact
- Writes to
scans.notesandscans.difficulty_drivers(JSONB) - Two
llm_callsrows (one per prompt)
Find output in the DB
SELECT notes->'notes' AS notes, difficulty_drivers->'drivers' AS drivers FROM scans WHERE id = '<scan_id>';
Report renderer.
Final pipeline stage. Reconstructs typed objects from persisted JSONB and renders the HTML report through Jinja templates.
In plain English
Takes all the AI outputs + detection findings + validation results and combines them into the HTML page the user opens at https://browser-recon.com/r/<slug>. Like a print template that fills in blanks. No AI involved.
What it does
Reads every JSONB column from the scans row, reconstructs typed dataclasses via browser_recon_server/report_renderer.py, and renders templates/report.html through Jinja. The template orchestrates 8 section partials in order:
| Section | Partial | Source |
|---|---|---|
| 01 / Detection | partials/detection_card.html | scans.findings |
| 02 / Scraping plan | partials/scraping_plan.html | Analysis + intent filter |
| 02c / Validation (T51.8) | partials/validation_section.html | validation_runs.report_view |
| 03 / Prerequisites | partials/prerequisites.html | Bucket B endpoints |
| 04 / Difficulty | partials/difficulty_drivers.html | scans.difficulty_drivers |
| 05 / Recommendation + cost | partials/recommendation_cost.html | scans.synthesis.recommendation |
| 06 / Starter code | partials/starter_code.html | scans.synthesis.starter_code |
| 07 / Evidence | partials/evidence_trail.html | Bucket counts |
| 08 / Drift | partials/drift_footer.html | scans.last_validated_at |
Download mode (T52)
The same template renders into a self-contained HTML file when download_mode=True is passed in the context. Static CSS/JS get inlined; dynamic features (Rerun, Feedback, CSRF meta) are stripped. Served via GET /r/<slug>/download with Content-Disposition: attachment.
Where it lives
Database impact
- Updates
scans.status = 'complete'andscans.completed_at - Updates
reportsrow with the rendered fields the dashboard query needs - Final
scan_eventsrow:(step='render', status='complete')— this is what unblocks the CLI's polling loop
How to test it
rye run pytest tests/unit/test_report_renderer.py rye run pytest tests/unit/test_report_templates.py rye run pytest tests/unit/test_report_partials_t12.py rye run pytest tests/unit/test_reports_endpoint.py
View a report
# Live (authenticated): https://browser-recon.com/r/<slug> # Self-contained download: https://browser-recon.com/r/<slug>/download
Scan events & polling.
Append-only event log of pipeline step transitions. The CLI polls every 1.5 s and renders a live spinner. Replaces the silent 90-second wait.
In plain English
Solves the "is anything happening?" problem. While the server processes a scan (60–120 seconds), the CLI checks in every 1.5 seconds asking "any progress?" The server replies with a timeline of which steps have started and finished. The CLI displays a live spinner so the user sees real progress instead of staring at a frozen terminal.
How it works in one paragraph
Every time the server starts or finishes a step (detection, validation, synthesis, etc.), it writes a row to a scan_events table. The CLI asks the server every 1.5 seconds "what's new since I last asked?" and prints any new rows as live progress. When the final "render complete" event arrives, the CLI stops polling and prints the report URL.
Three pieces
1. The table
CREATE TABLE scan_events ( id uuid PRIMARY KEY DEFAULT gen_random_uuid(), scan_id uuid NOT NULL REFERENCES scans(id) ON DELETE CASCADE, step text NOT NULL, -- 'detection' | 'analysis' | 'intent_filter' | ... status text NOT NULL, -- 'started' | 'complete' | 'errored' message text, -- human-readable progress text metadata jsonb, -- optional: endpoint counts created_at timestamptz NOT NULL DEFAULT now() ); CREATE INDEX idx_scan_events_scan_id_time ON scan_events (scan_id, created_at);
2. The emit helper
def emit_event(session, scan_id, step, status="started", message="", metadata=None): session.add(ScanEvent( scan_id=scan_id, step=step, status=status, message=message, metadata=metadata or {}, )) session.flush()
3. The polling endpoint
@router.get("/scans/{scan_id}/events") def get_events(scan_id: UUID, since: datetime | None = None): # Returns events strictly newer than `since`, ascending order. # Includes a `cursor` (latest created_at) so the CLI can advance.
CLI spinner
$ recon scan https://walmart.com ✓ detection complete 2 anti-bot vendors detected ✓ analysis complete 38 endpoints categorized ✓ intent_filter complete Bucket A: 4 · B: 8 · C: 26 ⠹ validation running Validating 4 Bucket A endpoints (2/4) · 18s scrub pending synthesis pending render pending
Where it lives
Find output in the DB
-- Full event timeline for a scan: SELECT step, status, message, created_at FROM scan_events WHERE scan_id = '<scan_id>' ORDER BY created_at ASC; -- Per-step duration: SELECT step, status, created_at, lead(created_at) OVER (PARTITION BY scan_id ORDER BY created_at) - created_at AS duration FROM scan_events WHERE scan_id = '<scan_id>';
LLM eval matrix.
Re-runs scan synthesis (or any individual prompt) through alternate LLM providers for quality comparison. Persisted to llm_evals with an admin UI matrix.
In plain English
Operator-facing quality tool. Lets the team re-run any AI call (typically the main synthesis) using a different model (GPT-5, Grok, etc.) and compare results side-by-side. Used to answer "could we save money by using a cheaper AI?" — answered by data, not hunches.
What it does
Given an existing scan, re-runs any prompt (typically scan_synthesis or intent_filter) through a different model — Claude, OpenAI GPT, xAI Grok — using the same input the production call saw. Results land in llm_evals as an append-only history. The admin UI at /admin/evals/<scan_id> renders a matrix where rows are prompts and columns are models; clicking any cell opens a side-by-side diff modal against the production response.
Why it exists
Quality validation. Lets the operator answer questions like:
- Could we switch
scan_synthesisfrom Claude Sonnet to Grok-3-mini and save 80% of the cost? (Answer per the Staples eval: no — cheap models give overconfident wrong recommendations on protected sites.) - Does
intent_filterhold up on GPT-5.4? (Answer: gpt-5.4-mini is competitive on small sites; over-prunes Bucket B on complex ones.) - Are LLM outputs stable across re-runs? (Same model, two runs — append-only history makes this trivially queryable.)
Where it lives
Database impact
- Appends to
llm_evalson every run (no UNIQUE; re-runs preserve history) - S3 keys per response saved at
s3_response_path
How to test it
# From the admin UI: navigate to /admin/evals, pick a scan, # click "Start new eval", pick prompts + model, confirm spend. # From the CLI (operator only): recon llm-eval --scan-id 82f42438-... \ --prompts scan_synthesis,intent_filter \ --provider-b grok-3-mini \ --confirm-spend 1.00
Find output in the DB
-- Latest complete eval per (scan, prompt, model): SELECT DISTINCT ON (scan_id, prompt_name, model) prompt_name, model, provider, cost_usd, response_body FROM llm_evals WHERE scan_id = '<scan_id>' AND status = 'complete' ORDER BY scan_id, prompt_name, model, created_at DESC;
The recon CLI.
The customer-facing command-line tool. Installed once via pipx, used by typing recon in a terminal.
In plain English
The CLI is the customer's gateway to the tool. It runs on their computer, knows how to launch Chrome, knows how to talk to our server, and shows them the report URL when the scan is done. From v0.3.0 onward it doesn't contain any of our secret sauce — all the smart stuff is server-side.
Install
pipx install browser-recon
# If pipx isn't installed:
python3 -m pip install --user pipx
python3 -m pipx ensurepath
pipx install browser-recon
Commands
| Command | What it does |
|---|---|
recon login | Exchange a one-time code for an API key; saves to ~/.recon/config.toml |
recon scan <url> | Run an interactive scan. Launches Chrome, polls server progress, prints report URL on completion |
recon scan <url> --no-wait | Fire-and-forget; returns the scan_id without polling. For CI use |
recon --version | Prints CLI version |
recon llm-eval | Admin only — re-run a prompt on alternate providers (requires BROWSER_RECON_ADMIN=1) |
What ships in the wheel (v0.3.0)
Only non-proprietary glue. The wheel contains:
browser_recon/capture/— Chrome launcher + CDP listenerbrowser_recon/cli/— entry point, prompts, login, configbrowser_recon/client.py— thin httpx wrapper around the server APIbrowser_recon/poll.py— scan-events polling loop + rich spinner
Roughly 2 MB installed. No detection rules, no validation logic, no LLM prompts, no scoring heuristics. Everything proprietary lives server-side.
What ran where (pre-T53 vs post-T53)
| Component | Pre-T53 (v0.2.x) | Post-T53 (v0.3.0) |
|---|---|---|
| Chrome capture | Client | Client |
| Detection | Client | Server |
| Analysis | Client | Server |
| Validation | Client (needed proxy env in user shell!) | Server |
| Scrubbing | Client (pre-upload) | Server (post-validation) |
| LLM prompts | Server | Server |
| Report rendering | Server | Server |
CLI ↔ server protocol
Exactly two endpoints:
POST /scans/<id>/capture // upload raw blob, returns 202 + status_url GET /scans/<id>/events?since // poll for pipeline progress events
Both authenticated via Authorization: Bearer rec_live_…. TLS 1.2/1.3 in transit (Render terminates).
Config file
~/.recon/config.toml:
[auth] api_key = "rec_live_..." [server] base_url = "https://browser-recon.com"
Source
From signup to first scan.
What a brand-new customer does, in order. Five touchpoints: sign up, get an API key, install the CLI, log in, run a scan.
In plain English
This is the new-customer onboarding path. The web side handles signup and billing; the CLI side handles the actual scanning. Customers can also scan entirely from the web dashboard (no CLI install) — but the CLI is recommended because it captures from their real Chrome on their real network.
1 · Signup
Customer visits browser-recon.com. Sees the landing page (templates/landing.html). Clicks "Sign up", enters email + password (no email verification required in MVP — Render manages the form). Server creates a row in users with role='user' and a free-tier credit allocation.
2 · Dashboard login
After signup, redirected to /dashboard/login. Magic-link email flow (templates/dashboard_login.html → dashboard_login_sent.html). User clicks link, lands at /dashboard with a session cookie.
3 · API key creation
From the dashboard, clicks "Create API key" → server generates a rec_live_… token, hashes it into api_keys, shows the plaintext once (templates/dashboard_api_key_created.html). User copies it.
4 · CLI install + login
pipx install browser-recon
recon login
# prompts for API key, validates against server, saves to ~/.recon/config.toml
5 · First scan
recon scan https://example.com # 1. CLI picks a starter template (products / stocks / etc.) # 2. CLI asks: "Describe what data you want to scrape:" # 3. Chrome launches. User browses the site, clicks around, navigates. # 4. Ctrl+C → flow confirm summary # 5. CLI uploads capture, polls events, prints live progress. # 6. When render completes, CLI prints report URL: # https://browser-recon.com/r/<slug>
Web UI navigation
| Path | Purpose |
|---|---|
/ | Landing page |
/pricing | Tier comparison |
/dashboard | User home — recent scans, credit balance, quick links |
/dashboard/data | All scans (paginated) |
/dashboard/scan/new | Browser-driven scan flow (no CLI required) |
/dashboard/api-keys | Manage API keys |
/r/<slug> | Public-share report URL |
/r/<slug>/download | Self-contained HTML download (T52) |
Permissions / roles
| Role | Sees |
|---|---|
guest | Landing, pricing, public reports |
user | Above + their own dashboard + their own scans + their own API keys |
admin | Above + /admin/* read-only views |
super_admin | Above + all admin write operations, evals, audit logs |
Admin operations.
A separate web UI at /admin/* for the team to monitor health, costs, AI quality, and individual scans.
In plain English
A separate part of the site that customers never see. Used by the operator (you) to: watch how the system is doing overall, dig into individual scans to debug problems, manually re-run AI calls with different models to compare quality, and manage users + roles. Locked behind a role check so only admin / super_admin can see it.
Layout
Full-width Tailwind layout (templates/admin_layout.html) separate from the user-facing dashboard. Sidebar nav, sticky header with role chip, content area on the right.
Pages
| Path | Purpose |
|---|---|
/admin/dashboard | KPI strip — total scans, success rate, cost, top users, top domains, prompt reliability, 30-day cost trend |
/admin/scans | Cross-user scan list with filters (user / status / date / domain / include-soft-deleted) and per-scan debug links |
/debug/scans/<id> | Per-scan deep-dive — Overview / Capture / Buckets / LLM calls / Audit tabs + lazy-loaded S3 dumps + the Evals tab (T48) |
/admin/evals | Index of scans with at least one eval row |
/admin/evals/<scan_id> | Matrix view: rows = prompts, columns = models. Click any cell for the side-by-side diff modal vs production (T50) |
/admin/users | User list, role management (super_admin only) |
/admin/audit | Audit log of admin actions |
Where it lives
Promoting a user to super_admin
Set BROWSER_RECON_SUPER_ADMIN_EMAIL=… in Render env. The next time a user with that email logs in, their users.role auto-promotes. Used for initial setup; subsequent promotions happen via the admin UI's user management page.
Audit trail
Every admin action (role change, scan soft-delete, eval start) appends to audit_log. Find recent admin activity:
SELECT actor_id, action, target_type, target_id, created_at FROM audit_log ORDER BY created_at DESC LIMIT 50;
Concerns, gaps & pending work.
Honest accounting of what this guide doesn't cover, what's incomplete in the system itself, and what's worth doing next. Read this last.
In plain English
This page is the "stuff I'd want a new engineer to know before they start digging" list. Nothing here is broken — it's all known limitations, known dead code, and known next-step ideas.
Documentation gaps
Things this guide could cover but doesn't. Not blockers — just acknowledged gaps if the guide ever becomes the primary onboarding doc.
| Topic | Why it's missing | Severity |
|---|---|---|
| Local development setup | No page covering rye sync, .env configuration, running migrations locally, starting the FastAPI server on a fresh checkout. | Medium |
| Deployment to Render | No page on how the app is wired to Render, what env vars need to be set there, how database migrations run on deploy. | Medium |
| Billing / Stripe integration | Stripe code exists in the repo (tests for stripe_webhook) but no page documents the billing flow, credit allocation, or webhook handling. | Medium |
| Soft-delete behaviour | Admin UI mentions include-soft-deleted filter; no page explains the soft-delete mechanism. | Low |
| Audit log details | Admin UI page references it; no dedicated section explaining what actions are audited or how to query. | Low |
| API-level rate limiting | No documentation on per-customer caps or how the server limits abuse. | Low |
| Error handling | No page on what happens when a pipeline step fails mid-scan — how the user sees it, how the operator debugs it. | Low |
| Screenshots | The guide uses SVG diagrams but no actual screenshots of the admin UI, the dashboard, or the report. Capturing them needs a live scan to draw from. | Low |
| In-page search | The guide is 19 pages, ~130 KB. A search box would help. Not in v1. | Low |
Code-level dead weight (post-T53 cutover)
The T53 thin-CLI migration left a few orphan modules. None are broken; they're just unreachable now and worth deleting in a follow-up sweep.
| Location | Status | Action |
|---|---|---|
browser_recon/transport/uploader.py | Still carries client funcs (upload_to_server, complete_scan, submit_validation_runs, submit_replay_runs) targeting endpoints the server no longer serves. | Delete the dead client funcs; keep _post_json, create_draft, etc. that are still in use. |
browser_recon_server/flow_confirm_orchestrator.py | Sole caller was the deleted run_confirm_pipeline. No grep hits in the live code path. | Verify no test imports remain; delete. |
browser_recon_server/scan_pipeline.py | Kept alive because rerun_orchestrator.py still calls run_filter_pipeline + run_synthesis_pipeline for the POST /scans/{id}/rerun-stage endpoint. The new pipeline_orchestrator.py doesn't replace this flow. | Future T55: rewire rerun-stage to use the new orchestrator, then delete scan_pipeline.py. |
POST /scans (legacy single-shot endpoint) | Still in routes/scans.py. Calls scan_pipeline.run_pipeline. Not called by any current CLI. | Decide: deprecate + remove, or keep as a server-only test scaffolding. |
Legacy validation: None field in CLI blob dicts | Still present in some test fixtures. Harmless — server ignores it. | Clean up in the next test-suite sweep. |
Architectural caveats (won't change soon)
- Cookies captured locally are bound to the customer's IP. When validation later fires through a different proxy IP, those cookies are often invalid against the target's anti-bot challenge. Synthesis surfaces this as "cookie warmup required" in the recommendation. There's no clean fix that doesn't involve running Chrome through the proxy too, which would defeat the residential-IP advantage of the customer's network.
- Rate-limit probing through rotating-residential proxies is structurally noisy. Each probe request hits the target from a different IP, so the target physically cannot rate-limit a single source it never sees twice. The result carries a
proxy_rotation_mode: "rotating"flag + caveat string; synthesis is instructed to recommend ≥1.5 s delay regardless of measured value. - The capture upload is unscrubbed. Validation needs the real cookies + auth headers. Scrubbing happens server-side, after validation, before persistence. Secrets live in server RAM transiently. Trade-off accepted in T53 planning.
- Pre-T54.5: AWS Secrets Manager not yet in use. Proxy credentials still live in Render env vars. Migration to AWS Secrets is a planned follow-up.
- Pre-T54.5: PyPI publishing setup not yet done. The wheel builds cleanly (
dist/browser_recon-0.3.0-py3-none-any.whl, 128 KB) but hasn't been uploaded to PyPI yet. Customers can'tpipx install browser-reconuntil that step.
Pending tasks (next-up)
| Task | Effort | Why |
|---|---|---|
| AWS Secrets Manager for proxy creds | ~half day | Operator-side hygiene — rotation, audit, IAM-gated access. Code reader stays in server. |
| PyPI publish v0.3.0 | ~half day (mostly account setup) | Customers can pipx install browser-recon. Currently they'd install from source. |
Rewire rerun-stage to use the new orchestrator | ~1 day | Lets us delete scan_pipeline.py. Cleaner mental model — one pipeline, one orchestrator. |
Delete dead client funcs in transport/uploader.py | ~hour | Wheel shrinks further; reduces confusion for new contributors. |
| Tighten error-handling UX | ~half day | The CLI's poll loop handles a few error cases; broader coverage (proxy auth failure, AWS Secrets unavailable, partial validation) needs explicit handling. |
| A/B the eval recommendation on more sites | ~quarter day per site | The T48 eval matrix has data from Staples + Walmart. More variety strengthens the "Sonnet for synthesis, Grok-3-mini for everything else" claim. |
| Stripe / billing documentation page | ~half day | Filling the doc gap above. |
Notable wins shipped recently
- T51 (validation redesign): validation wall-clock per Bucket A endpoint went from ~150 s to ~25 s. The new pipeline is two-axis (library × proxy) cascade with parallelism, bandwidth tracking, and a persisted JSON shape rich enough to drive real cost projections.
- T51.8 (LLM payload compaction): scan_synthesis input tokens dropped 65–81% (Walmart: 81k → 29k, Staples: 84k → 16k). Synthesis cost dropped 52–69%.
- T52 (report polish): theme-consistent starter code + Download Report button (self-contained HTML).
- T52.1 (snippet comment trim): curl + httpx + starter code now ship minimal comments.
- T53 (thin-CLI cutover): CLI wheel went from ~20 MB to 128 KB. No proprietary code on customer machines. Operator secrets never travel to the customer's shell. Customers see a live progress spinner instead of silent 90-second waits.
Doc revision history
This guide is a living document. Each major feature ships an update.
| Date | Author | Changes |
|---|---|---|
| 2026-05-13 | Lazy Coder | Initial T54 ship — 19 pages, plain-English callouts, glossary. |
| 2026-05-13 | Lazy Coder | T53 implementation complete; page 20 (this one) added. |