browser-recon.
overview · v2 historical reference
Product overview / for a fellow scraper

A diagnostic tool for web scrapers.

Paste a URL, get a report on anti-bot stack, endpoints, recommended tooling, and cost — backed by evidence, not vibes.

Status: zero code shipped beyond an existing local Claude Code skill · Successor doc: phase-2-redesign · Mark this up like a junior dev's PR
For
A fellow scraper
Position
Neutral diagnostician
Not
An unblocker
First step
4-hour CLI
The thesis · in one paragraph

Collapse the 30–90 minutes a scraper burns scoping a new project into a 2-minute browse and a paid report.

browser-recon identifies the anti-bot vendor, enumerates the API endpoints, recommends tooling and proxy type with confidence scores, gives a cost band per 1k requests, and ships starter code that runs. Every claim has evidence attached. It is not an unblocker — it tells you which one (if any) you need.

01 / The pain it solves

Scrapers burn an hour every quote, and half the quotes are wrong.

When a scraper or agency scopes a new project, someone opens DevTools, reads the network tab, recognizes anti-bot signatures, looks up tooling, and guesses a price.

Half the time the quote is wrong because the anti-bot was misread. The pattern repeats per project, per client, per quarter — across the whole industry. browser-recon turns it into a paid report.

Position

Neutral diagnostician, not vendor. ScrapFly and ZenRows want to sell you their unblocker. We tell you whether you need one at all, and which.

02 / Who buys it

Three personas — and a non-obvious thing about how they actually evaluate it.

Skipping enterprise. Long sales cycles will kill a side project.

P · 01
Freelance scrapers
Upwork / Toptal / Contra. Scoping new projects under time pressure.
P · 02
Small scraping agencies
5–30 people. During client intake calls.
P · 03
AI agent startups
Browser-automation companies running pre-flight checks.
The thing most product docs get wrong

Experienced scrapers don't trust new tools by reading marketing copy. They stress-test by pointing the tool at the hardest known target they can think of — Ticketmaster, Supreme, Nike SNKRS, StubHub. If the tool calls Cloudflare on a site running Imperva, they close the tab forever. The free scan isn't a sales tool. It's the credibility test for the whole product.

03 / How it works

Browser stays on the user's machine. Analysis is hosted.

That split is the product. Captures are local, fingerprints are uploaded, auth is stripped before it leaves the machine.

USER MACHINE YOUR SERVER CLI launches Chrome (CDP) User browses the site CLI captures network + cookies Strips PII, drops cookie values POST /scans Deterministic analyzer rules for top 6 vendors Optional: LLM polish verdict prose, starter code Render HTML report single shareable URL recon.app/r/abc123 CLI prints shareable URL capture is local · analysis is hosted · auth never uploaded
Fig. 1 — Capture / analysis split. The browser stays local; the fingerprint blob is what travels.
04 / The classifier debate

Rules, not LLM, for classification. Tell me if I'm wrong.

This is the call I'm least sure about.

Rules-as-classifier

  • Deterministic. Same input, same answer, forever.
  • Defensible — maintenance is the moat.
  • Covers 95% of the protected web in ~600–1000 lines of pattern matching.
  • LLM still useful for prose polish, fallback investigator, starter code.

LLM-as-classifier

  • Ships fast. Tempting for v0.1.
  • Non-deterministic. Different answer same site.
  • Hallucinates vendors that don't exist.
  • Trust killer for a tool whose job is diagnosis.
05 / What's in a report

Twelve pieces. Five of them are the actual product.

Anti-bot detection alone is wallpaper. Endpoint inventory + tooling call + cost band + starter code is the meal.

01
Anti-bot stack vendor, mode, severity, with evidence
supporting
02
Site architecture SSR vs SPA, framework + version, hydration shape
supporting
03
Endpoint inventory deduped table — methods, paths, auth requirements
core
04
Auth model JWT vs session vs OAuth, CSRF, HMAC
supporting
05
Fingerprinting signals JA3, FingerprintJS, canvas probes
supporting
06
Tooling recommendation curl_cffi / Playwright / paid unblocker, with confidence %
core
07
Proxy type recommendation datacenter / residential / mobile, with confidence %
core
08
Cost band range per 1k requests at scale
core
09
Difficulty drivers replaces 1–10 score · impact + evidence + severity
supporting
10
Starter code Python snippet, ready to run
core
11
Evidence trail raw headers, cookies, JS files — every claim is checkable
supporting
12
"Did this match reality?" buttons quietly the most important thing in the whole report
supporting
Validated by target persona

The core bundle (items 3, 6, 7, 8, 10) is the minimum sellable unit. Confirmed by a fellow scraper who said: "I myself wouldn't pay a dime just to know if a site has Cloudflare."

06 / Freemium & pricing

Two free scans answer "is this tool's read correct?" Paid scans answer "give me everything."

Same engine, two value props. Free portion is the trust test; paid portion is the time-savings.

Free · CLI
$0 open source
Captures locally. Basic Markdown report. No upload. Distribution engine.
Free · web
2 scans no signup
Full vendor identification. Partial report. Endpoints, tooling, cost, code gated.
One-off
$5 = 10 scans
Stripe Payment Link. Full reports unlock. The credibility-converts-to-cash moment.
Pro
$19/mo
100 scans. Basic dashboard. Hosted shareable reports.
Team
$79/mo
1,000 scans. API access. Team seats. The number that probably matters.
Honest expectation

Freemium scraping tools historically convert in single-digit percentages. The audience is technical, cheap, and self-sufficient. Volume of free users matters more than product polish. Distribution > product.

07 / The credibility constraint

Detection must cover six vendors. Full pipeline only needs to cover two.

Launch will be evaluated by scrapers pasting famously-hard URLs. v0.1 cannot reply "we only support Cloudflare" on five out of six test scans.

Detection layer (all 6)

  • Cloudflare
  • DataDome
  • Akamai
  • PerimeterX / HUMAN
  • Imperva
  • Kasada

Full pipeline (2 at launch)

  • Cloudflare — tooling, cost, starter code, proxy guidance
  • DataDome — tooling, cost, starter code, proxy guidance
  • Other four: vendor identified + endpoints captured + waitlist signup
  • Waitlist tells me which vendor to ship next — by demand, not by guess
08 / What this is not

Saying no upfront.

Every "what this is not" up here is a feature request I'm declining before it arrives.

09 / Roadmap

Killable at every step.

Each version has a kill criterion. Evidence decides whether the next version exists.

v0.1.athis weekend
Strip Claude Code dep from existing browser-recon. CLI generates local Markdown report using existing detection logic.
Show to 5 scrapers. Does the output quality make them say they'd try a hosted version? If <2 yeses, stop or pivot.
v0.1.b2–3 weekends
Detection rules for top 6 vendors. Hosted analyzer + Jinja-templated HTML report. Full pipeline for Cloudflare + DataDome only.
Local prototype passes the Ticketmaster / Supreme / Nike SNKRS / StubHub test. If detection is wrong on 2+ famous sites, stop and fix before launch.
v0.1.c1 weekend
FastAPI server, SQLite, Stripe Payment Link, "did this match reality?" feedback. 2 free scans, $5 = 10 scans after. Landing page with 6 pre-scanned famous-hard sites.
Post on r/webscraping. Did >5 unique users sign up? Did at least 1 pay? If both no after 2 weeks, stop.
v0.2waitlist-driven
Full recommendation pipeline for the next 1–2 vendors based on actual waitlist demand.
Are paying users coming back? Are reported accuracy outcomes >70% on feedback buttons?
v0.3
LLM polish layer (Haiku). Starter code generation. Shareable client-facing variant of report.
Is one-off scan revenue >$500/mo? Time to add a subscription tier?
v0.4
Subscription dashboard, API access, team seats. Account system. Re-scan delta view.
Does a paying user request Tier 2 (replay)?
v1.0
Tier 2: local replay engine in CLI. User runs scan, then browser-recon verify re-issues captured requests with several toolchains. Auth never leaves the user's machine.
Hit when v0.4 has ≥30 paying customers.
10 / What I want to test, in order

Each question only matters if the previous one was yes.

Don't build features that depend on later answers before earlier ones are confirmed.

  1. Is the report output good enough to make experienced scrapers want a hosted version?v0.1.a · before any server work
  2. Does detection work correctly on famously-hard sites?v0.1.b · before any launch
  3. Will any scraper sign up + pay $5 after their free scans?v0.1.c
  4. Are paying users coming back? Is accuracy holding?v0.2 + feedback loop
  5. Will agencies pay for team access and API?v0.3 – v0.4
  6. Is replay verification valuable enough to justify the security overhead?v1.0
11 / Honest risks

The five things most likely to kill it.

Ordered by what I think the real probability is, not what I'd prefer to write about.

12 / The first concrete step

This Saturday. Four hours.

Strip the Claude Code dep from the existing browser-recon skill. Generate a Markdown report locally using existing detection logic. No server, no payment, no LLM.

The question to ask 5 scrapers

"If this output came from a hosted tool that worked on any URL in 2 minutes, would you have tried it?"

Total scope: about 4 hours of work. The answer changes everything.

13 / What I want from you

Mark this up like a junior dev's PR. Don't be polite.

The product fails if you are. Expand each question to see what I'm specifically uncertain about.

Is the minimum sellable unit (§05) right?
Anything missing that you'd want as a buyer? Anything in the "core" five that's actually decoration? Anything I marked "supporting" that you'd refuse to pay without?
Rules vs LLM for the classifier (§04) — am I wrong?
This is the call I'm least sure about. Maintenance is the moat, but maintenance is also the work. Is there a hybrid that I'm missing? Would you trust an LLM verdict if it cited evidence inline?
Pricing & freemium (§06) — would you sign up? Where would you walk away?
Free → $5 → $19 → $79. Where do you stop? At what point does the price feel wrong relative to what you're getting? Is $5 too low to take seriously, or exactly the right "no-think" buy?
The credibility constraint (§07) — is "6 detected, 2 with full pipeline" believable?
Or do I need full pipeline on more vendors before launching? My worry: a scraper who points it at an Akamai site sees "vendor identified + waitlist" and bounces forever. Is that the right tradeoff, or fatal?
Roadmap (§09) — am I shipping the wrong v0.1?
Should something move earlier or later? Are the kill criteria the right ones — i.e. would you actually stop on them, or are they soft enough that I'd let myself slide past?
What this is not (§08) — anything I should reconsider?
Any of those "nots" feel like they're leaving real money on the table? Or any that should be a louder "not" than they are now?
The thing I'm not seeing because I'm too close. What is it?
The one I most want answered. Free-form.
See successor: phase-2-redesign →