Scrapable with curl_cffi + residential proxies, ~3 hrs of setup, ~$0.80 per 1k requests at scale.
Tooling
curl_cffi
92% confidence
Proxy type
Residential
71% confidence
Cost band
$0.50 – $1.40
per 1k req · range
01 / DETECTION
Anti-bot stack
Cloudflare Bot Management is active in standard mode (not "I'm Under Attack"). No interactive challenge fired during the session — JS challenge was solved silently on first navigation.
Cloudflare Bot Management
Medium · standard mode
JS challenge issued on initial navigation, cleared via cf_clearance cookie. No Turnstile widget detected. __cf_bm rotates every 30 minutes.
Active TLS fingerprint inspection observed. Vanilla requests and aiohttp will fail with high confidence — they emit a JA3 hash that Cloudflare flags as automated.
FingerprintJS Pro loaded but signal appears to be sent to analytics, not gating access. Safe to ignore for headless HTTP scraping; relevant only if using Playwright at scale.
02 / ARCHITECTURE
Site architecture
Next.js 14 with server-side rendering and React Server Components. First page-load HTML contains hydrated product data inside __NEXT_DATA__ — meaning you don't need a browser to extract listings, the JSON is already sitting in the markup.
Implication: for category and product pages, a single GET + HTML parse extracts everything. No JS execution required. JSON API fallback exists for paginated/dynamic content — see endpoint inventory below.
03 / ENDPOINTS
Endpoint inventory
11 distinct API endpoints captured, deduplicated from 142 raw requests. Static assets (images, fonts, analytics beacons) filtered out.
Method
Path
Purpose
Auth
GET
/api/products
Catalog listing, paginated
none
GET
/api/products/[id]
Product detail
none
GET
/api/categories
Category tree
none
POST
/api/search
Algolia-backed search
app key in body
GET
/api/inventory/[sku]
Stock check
none
POST
/api/cart/add
Cart mutation
session cookie
GET
/api/user/me
Profile
JWT bearer
POST
/api/auth/login
Login → JWT
credentials
GET
/api/recommendations
Related items
none
POST
/api/reviews
Submit review
JWT bearer
GET
/api/sitemap
Full URL list (gold)
none
04 / DIFFICULTY
What makes this site hard (or easy)
Rather than a single 1–10 score, here's what we actually observed and how each factor affects effort.
TLS fingerprinting is checked. Vanilla HTTP libraries fail deterministically. You must use curl_cffi, tls-client, or a real browser.
impact: high · evidence: JA3 challenge pattern
High impact
No interactive challenge. No Turnstile, no CAPTCHA was triggered during 3 minutes of normal browsing. Solver costs unlikely.
impact: positive · evidence: no challenge widget loaded
Easy
JSON in HTML. Product data hydrated server-side into __NEXT_DATA__. No JS execution required for listing pages.
impact: positive · evidence: 12KB JSON found in initial response
Easy
Public sitemap available./api/sitemap returns ~48,000 URLs without auth. Skip pagination logic entirely.
impact: positive · evidence: 200 OK with full URL list
Easy
Datacenter IPs likely flagged. Cloudflare Bot Management correlates IP reputation with fingerprint. DC proxies will work for low volume but degrade at scale.
impact: medium · confidence: 71% · evidence: empirical pattern across similar deployments
Medium
Rate limiting suspected but not confirmed. No 429s observed in 3 min, but x-ratelimit-* headers are absent — rate limit may be silent ban.
impact: unknown · recommendation: start at 2 req/sec, ramp slowly
Unknown
05 / COST
Cost estimate at 1M requests/month
Based on observed signals and current market rates for proxy and tooling. Verify with replay testing (Tier 2) before committing to client pricing.
curl_cffi (free, OSS)
library license: MIT
$0
included
Residential proxy (Bright Data / Decodo / Oxylabs)
~3 GB at avg 3KB per response
$450 – $900
$1.50–3.00 / GB
Compute (single VPS, async)
can sustain ~50 req/sec on 2 vCPU
$20 – $40
monthly
Total estimate
per 1M requests/month
$470 – $940
$0.47 – $0.94 / 1k
06 / EVIDENCE
Raw evidence trail
Every claim above is derived from data captured in your session. Click to expand.