Paste a URL, get a report on anti-bot stack, endpoints, recommended tooling, and cost — backed by evidence, not vibes.
Status: zero code shipped beyond an existing local Claude Code skill · Successor doc: phase-2-redesign · Mark this up like a junior dev's PR
For
A fellow scraper
Position
Neutral diagnostician
Not
An unblocker
First step
4-hour CLI
The thesis · in one paragraph
Collapse the 30–90 minutes a scraper burns scoping a new project into a 2-minute browse and a paid report.
browser-recon identifies the anti-bot vendor, enumerates the API endpoints, recommends tooling and proxy type with confidence scores, gives a cost band per 1k requests, and ships starter code that runs. Every claim has evidence attached. It is not an unblocker — it tells you which one (if any) you need.
01 /The pain it solves
Scrapers burn an hour every quote, and half the quotes are wrong.
When a scraper or agency scopes a new project, someone opens DevTools, reads the network tab, recognizes anti-bot signatures, looks up tooling, and guesses a price.
Half the time the quote is wrong because the anti-bot was misread. The pattern repeats per project, per client, per quarter — across the whole industry. browser-recon turns it into a paid report.
Position
Neutral diagnostician, not vendor. ScrapFly and ZenRows want to sell you their unblocker. We tell you whether you need one at all, and which.
02 /Who buys it
Three personas — and a non-obvious thing about how they actually evaluate it.
Skipping enterprise. Long sales cycles will kill a side project.
P · 01
Freelance scrapers
Upwork / Toptal / Contra. Scoping new projects under time pressure.
Experienced scrapers don't trust new tools by reading marketing copy. They stress-test by pointing the tool at the hardest known target they can think of — Ticketmaster, Supreme, Nike SNKRS, StubHub. If the tool calls Cloudflare on a site running Imperva, they close the tab forever. The free scan isn't a sales tool. It's the credibility test for the whole product.
03 /How it works
Browser stays on the user's machine. Analysis is hosted.
That split is the product. Captures are local, fingerprints are uploaded, auth is stripped before it leaves the machine.
Fig. 1 — Capture / analysis split. The browser stays local; the fingerprint blob is what travels.
04 /The classifier debate
Rules, not LLM, for classification. Tell me if I'm wrong.
This is the call I'm least sure about.
Rules-as-classifier
Deterministic. Same input, same answer, forever.
Defensible — maintenance is the moat.
Covers 95% of the protected web in ~600–1000 lines of pattern matching.
LLM still useful for prose polish, fallback investigator, starter code.
LLM-as-classifier
Ships fast. Tempting for v0.1.
Non-deterministic. Different answer same site.
Hallucinates vendors that don't exist.
Trust killer for a tool whose job is diagnosis.
05 /What's in a report
Twelve pieces. Five of them are the actual product.
Anti-bot detection alone is wallpaper. Endpoint inventory + tooling call + cost band + starter code is the meal.
01
Anti-bot stack vendor, mode, severity, with evidence
supporting
02
Site architecture SSR vs SPA, framework + version, hydration shape
Evidence trail raw headers, cookies, JS files — every claim is checkable
supporting
12
"Did this match reality?" buttons quietly the most important thing in the whole report
supporting
Validated by target persona
The core bundle (items 3, 6, 7, 8, 10) is the minimum sellable unit. Confirmed by a fellow scraper who said: "I myself wouldn't pay a dime just to know if a site has Cloudflare."
06 /Freemium & pricing
Two free scans answer "is this tool's read correct?" Paid scans answer "give me everything."
Same engine, two value props. Free portion is the trust test; paid portion is the time-savings.
Free · CLI
$0 open source
Captures locally. Basic Markdown report. No upload. Distribution engine.
Free · web
2 scans no signup
Full vendor identification. Partial report. Endpoints, tooling, cost, code gated.
One-off
$5 = 10 scans
Stripe Payment Link. Full reports unlock. The credibility-converts-to-cash moment.
1,000 scans. API access. Team seats. The number that probably matters.
Honest expectation
Freemium scraping tools historically convert in single-digit percentages. The audience is technical, cheap, and self-sufficient. Volume of free users matters more than product polish. Distribution > product.
07 /The credibility constraint
Detection must cover six vendors. Full pipeline only needs to cover two.
Launch will be evaluated by scrapers pasting famously-hard URLs. v0.1 cannot reply "we only support Cloudflare" on five out of six test scans.
Other four: vendor identified + endpoints captured + waitlist signup
Waitlist tells me which vendor to ship next — by demand, not by guess
08 /What this is not
Saying no upfront.
Every "what this is not" up here is a feature request I'm declining before it arrives.
Not a scraper. We tell you how to fetch data; we don't fetch it.
scope
Not an unblocker. We don't proxy traffic.
scope
Not a CAPTCHA solver. We tell you if you'll need one.
scope
Not enterprise-targeted at launch. No SOC2, no SSO.
scope
Not promising correctness. Confidence scores are explicit. Selling 85% accurate expert pattern-matching in 2 minutes vs. do-it-yourself in 90.
honest
Not auth-handling at launch. No replay tier in v1. The moment we accept auth on our server, we're a security business wearing a software business's clothes.
deferred
09 /Roadmap
Killable at every step.
Each version has a kill criterion. Evidence decides whether the next version exists.
v0.1.athis weekend
Strip Claude Code dep from existing browser-recon. CLI generates local Markdown report using existing detection logic.
Show to 5 scrapers. Does the output quality make them say they'd try a hosted version? If <2 yeses, stop or pivot.
v0.1.b2–3 weekends
Detection rules for top 6 vendors. Hosted analyzer + Jinja-templated HTML report. Full pipeline for Cloudflare + DataDome only.
Local prototype passes the Ticketmaster / Supreme / Nike SNKRS / StubHub test. If detection is wrong on 2+ famous sites, stop and fix before launch.
v0.1.c1 weekend
FastAPI server, SQLite, Stripe Payment Link, "did this match reality?" feedback. 2 free scans, $5 = 10 scans after. Landing page with 6 pre-scanned famous-hard sites.
Post on r/webscraping. Did >5 unique users sign up? Did at least 1 pay? If both no after 2 weeks, stop.
v0.2waitlist-driven
Full recommendation pipeline for the next 1–2 vendors based on actual waitlist demand.
Are paying users coming back? Are reported accuracy outcomes >70% on feedback buttons?
Is one-off scan revenue >$500/mo? Time to add a subscription tier?
v0.4
Subscription dashboard, API access, team seats. Account system. Re-scan delta view.
Does a paying user request Tier 2 (replay)?
v1.0
Tier 2: local replay engine in CLI. User runs scan, then browser-recon verify re-issues captured requests with several toolchains. Auth never leaves the user's machine.
Hit when v0.4 has ≥30 paying customers.
10 /What I want to test, in order
Each question only matters if the previous one was yes.
Don't build features that depend on later answers before earlier ones are confirmed.
Is the report output good enough to make experienced scrapers want a hosted version?v0.1.a · before any server work
Does detection work correctly on famously-hard sites?v0.1.b · before any launch
Will any scraper sign up + pay $5 after their free scans?v0.1.c
Are paying users coming back? Is accuracy holding?v0.2 + feedback loop
Will agencies pay for team access and API?v0.3 – v0.4
Is replay verification valuable enough to justify the security overhead?v1.0
11 /Honest risks
The five things most likely to kill it.
Ordered by what I think the real probability is, not what I'd prefer to write about.
My own pattern. Track record of multi-month plans that don't ship. Biggest risk is me adding scope before v0.1.a ships. Defense: ship v0.1.a before refining anything else in this doc further.impact: existential · likelihood: high
internal
Distribution. The product is 30% of this. The other 70% is being visible where freelance scrapers look. Plan: one good blog post a month. "Inside Cloudflare BM v3" type, scraper-Google-search bait.impact: high · likelihood: high
market
Maintenance burden. Anti-bot stacks evolve, rules drift, accuracy degrades. Mitigation: feedback flywheel, RSS-watching the scraping ecosystem, monthly rule review as a calendar block.impact: medium · likelihood: medium
ongoing
Conversion rate. Freemium scraping tools historically convert badly. Single-digit percentages expected. Mitigation: volume of free users.impact: medium · likelihood: high
economic
Competitors copy the free site-checker. ScrapFly or ZenRows could ship "free anti-bot check" in a quarter. Defense: neutral position. We can recommend ScrapFly or curl_cffi honestly. They can't.impact: medium · likelihood: low
competitive
12 /The first concrete step
This Saturday. Four hours.
Strip the Claude Code dep from the existing browser-recon skill. Generate a Markdown report locally using existing detection logic. No server, no payment, no LLM.
The question to ask 5 scrapers
"If this output came from a hosted tool that worked on any URL in 2 minutes, would you have tried it?"
Total scope: about 4 hours of work. The answer changes everything.
13 /What I want from you
Mark this up like a junior dev's PR. Don't be polite.
The product fails if you are. Expand each question to see what I'm specifically uncertain about.
Is the minimum sellable unit (§05) right?
Anything missing that you'd want as a buyer? Anything in the "core" five that's actually decoration? Anything I marked "supporting" that you'd refuse to pay without?
Rules vs LLM for the classifier (§04) — am I wrong?
This is the call I'm least sure about. Maintenance is the moat, but maintenance is also the work. Is there a hybrid that I'm missing? Would you trust an LLM verdict if it cited evidence inline?
Pricing & freemium (§06) — would you sign up? Where would you walk away?
Free → $5 → $19 → $79. Where do you stop? At what point does the price feel wrong relative to what you're getting? Is $5 too low to take seriously, or exactly the right "no-think" buy?
The credibility constraint (§07) — is "6 detected, 2 with full pipeline" believable?
Or do I need full pipeline on more vendors before launching? My worry: a scraper who points it at an Akamai site sees "vendor identified + waitlist" and bounces forever. Is that the right tradeoff, or fatal?
Roadmap (§09) — am I shipping the wrong v0.1?
Should something move earlier or later? Are the kill criteria the right ones — i.e. would you actually stop on them, or are they soft enough that I'd let myself slide past?
What this is not (§08) — anything I should reconsider?
Any of those "nots" feel like they're leaving real money on the table? Or any that should be a louder "not" than they are now?
The thing I'm not seeing because I'm too close. What is it?