Metadata-Version: 2.4
Name: browser-recon
Version: 0.3.9
Summary: Reconnaissance tool for production scrapers — captures real browser traffic, validates through real proxies, returns a verified scraping plan with runnable starter code.
Author-email: Lazy Coder <lazycoder.codes@gmail.com>
Requires-Python: >=3.11
Requires-Dist: alembic>=1.12
Requires-Dist: anthropic>=0.40
Requires-Dist: boto3>=1.34
Requires-Dist: cloudscraper>=1.2.71
Requires-Dist: curl-cffi>=0.5
Requires-Dist: fastapi>=0.136.1
Requires-Dist: httpx>=0.25
Requires-Dist: jinja2>=3.1
Requires-Dist: openai>=1.50.0
Requires-Dist: psycopg[binary]>=3.1
Requires-Dist: requests>=2.31
Requires-Dist: rich>=15.0.0
Requires-Dist: sqlalchemy>=2.0
Requires-Dist: tomli>=2.0
Requires-Dist: uvicorn>=0.46.0
Requires-Dist: websockets>=11
Description-Content-Type: text/markdown

# browser-recon

**Reconnaissance for production scrapers.** Launches Chrome on your machine,
captures what the browser actually sends, then returns a verified scraping
plan: which HTTP library to use, which headers are required, how to handle
cookies, which proxy tier, the safe rate-limit — plus a runnable Python
starter script.

All processing runs on the browser-recon server. The CLI is a thin client.

## Install

```bash
pipx install browser-recon
```

## Use

```bash
recon login                          # one-time — paste API key from your dashboard
recon scan https://walmart.com       # interactive: launches Chrome, captures, returns report URL
```

The CLI launches Chrome. You browse the target site for a couple of minutes —
click on what you care about, navigate to product pages, run a search, view
reviews, whatever data you want to scrape. Press Ctrl+C. The CLI uploads the
captured session to the server, shows live progress as the server processes
it, then prints the report URL.

## What you get back

A single HTML report containing:

- **Detection** — which anti-bot vendors protect the target (Cloudflare,
  Akamai Bot Manager, PerimeterX, DataDome, Imperva, …), with severity tier.
- **Scraping plan** — which captured endpoints carry the data you want vs
  which are session prerequisites vs which are noise.
- **Validation** — which HTTP library × proxy tier combination actually
  works against the live site, measured through real test requests
  (not inferred from priors).
- **Headers + cookies** — the minimum required set, plus cookie warmup
  instructions if the anti-bot needs a real-browser session first.
- **Rate-limit** — measured safe delay between requests.
- **Starter code** — a runnable Python script using the recommended library,
  headers, cookies, and delay.

## What this is *not*

Not a scraper. browser-recon produces the *plan* for a scraper. You (or your
AI assistant) write the scraper using that plan.

## Why measurement beats guessing

Most scrapers fail in production because the developer guessed wrong about
three things:

1. Which anti-bot system is in front of the target
2. Which headers the request actually needs
3. Whether their IP needs to look residential

browser-recon measures all three by firing real HTTP test requests through
real proxies and reporting which combination succeeded. The final
recommendation is grounded in what worked, not in what the LLM expected to
work.

## Architecture

The CLI ships only the non-proprietary glue: Chrome launching, network
capture, authentication, upload, and live-progress polling. Roughly 130 KB
installed. No detection rules, no validation logic, no LLM prompts, no
scoring heuristics live on your machine — everything proprietary runs on
the browser-recon server.

The server handles: anti-bot fingerprinting, endpoint inventory analysis,
intent-based endpoint classification, proxy-based active validation, secret
scrubbing, recommendation synthesis, auxiliary notes and difficulty drivers,
and report rendering.

You never need proxy credentials in your shell. The operator's proxy
provider account is server-side only.

## Status

Active development. v0.3.x is the thin-client architecture (server-side
pipeline, animated CLI progress, OIDC-trusted PyPI publishing).

## Contributing

Development setup, conventions, and test-suite instructions live in
[CONTRIBUTING.md](CONTRIBUTING.md).

## License

See LICENSE.
