Metadata-Version: 2.4
Name: scrapy-rotating-proxy-middleware
Version: 0.1.0
Summary: Scrapy downloader middleware that rotates proxies and retries on Cloudflare/DataDome/PerimeterX bans.
Author-email: JiBao Proxy <support@jibaoproxy.com>
License: MIT
Project-URL: Homepage, https://jibaoproxy.com
Project-URL: Source, https://github.com/jibaoproxyofficial-pixel/scrapy-rotating-proxy-middleware
Keywords: scrapy,proxy,rotating-proxy,web-scraping,cloudflare,datadome,anti-bot,residential-proxy
Classifier: Framework :: Scrapy
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Topic :: Internet :: WWW/HTTP
Requires-Python: >=3.8
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: Scrapy>=2.0
Requires-Dist: w3lib
Dynamic: license-file

# scrapy-rotating-proxy-middleware

[![PyPI version](https://img.shields.io/pypi/v/scrapy-rotating-proxy-middleware.svg)](https://pypi.org/project/scrapy-rotating-proxy-middleware/)
[![Python versions](https://img.shields.io/pypi/pyversions/scrapy-rotating-proxy-middleware.svg)](https://pypi.org/project/scrapy-rotating-proxy-middleware/)
[![License: MIT](https://img.shields.io/badge/License-MIT-yellow.svg)](https://opensource.org/licenses/MIT)

A drop-in [Scrapy](https://scrapy.org) downloader middleware that **rotates proxies and retries on bans** — `403`, `429`, Cloudflare "Just a moment", DataDome, and PerimeterX challenges. Point it at a static proxy list or a single rotating gateway and your spider stops dying on blocks.

```bash
pip install scrapy-rotating-proxy-middleware
```

## Why

Scrapy's built-in `HttpProxyMiddleware` assigns **one** proxy and never reacts when that exit IP gets blocked. In practice most anti-bot blocks aren't about your spider logic — they're about the IP and its [TLS fingerprint](https://jibaoproxy.com/blog/ja3-tls-fingerprint-detection-explained.html) being scored before your request reaches the page. This middleware:

- assigns a proxy per request (random from a list, or a rotating gateway),
- **detects bans** by status code *and* response-body signature (Cloudflare / DataDome / PerimeterX),
- transparently **rotates to a fresh proxy and retries**, with a per-request retry budget,
- moves inline `user:pass` credentials into the `Proxy-Authorization` header automatically.

## Setup

Enable it in `settings.py` and disable Scrapy's default proxy middleware:

```python
DOWNLOADER_MIDDLEWARES = {
    "scrapy.downloadermiddlewares.httpproxy.HttpProxyMiddleware": None,
    "scrapy_rotating_proxy.middleware.RotatingProxyMiddleware": 610,
}
```

### Option A — a rotating residential gateway (recommended)

A residential gateway gives you a **new exit IP on every connection** from a single URL, so you don't manage a list at all:

```python
# settings.py
ROTATING_PROXY_GATEWAY = "http://USERNAME:PASSWORD@us.jibaoproxy.com:913"
```

### Option B — a static proxy list

```python
ROTATING_PROXY_LIST = [
    "http://USERNAME:PASSWORD@proxy-a.example.com:8000",
    "http://USERNAME:PASSWORD@proxy-b.example.com:8000",
    "socks5://USERNAME:PASSWORD@proxy-c.example.com:1080",
]
```

That's it — run your spider as usual.

## Configuration

| Setting | Default | Description |
| --- | --- | --- |
| `ROTATING_PROXY_GATEWAY` | – | Single rotating-gateway URL. |
| `ROTATING_PROXY_LIST` | – | List of proxy URLs (used if no gateway). |
| `ROTATING_PROXY_BAN_CODES` | `403, 407, 429, 503` | Status codes treated as bans. |
| `ROTATING_PROXY_MAX_RETRIES` | `5` | Proxy rotations per request before giving up. |

Set a proxy on a single request explicitly and the middleware leaves it alone:

```python
yield scrapy.Request(url, meta={"proxy": "http://USERNAME:PASSWORD@host:port"})
```

## Ban detection

A response counts as a ban when its status is in `ROTATING_PROXY_BAN_CODES`, **or** the first 4 KB of the body matches a known anti-bot signature (`cf-chl`, `Just a moment`, `Attention Required`, `captcha-delivery`/DataDome, `px-captcha`/PerimeterX). On a ban the request is re-scheduled with a fresh proxy and `dont_filter=True`, up to the retry budget.

If you keep hitting bans after rotation, the exit IPs themselves are the problem — datacenter ranges get scored as bot traffic at the ASN level. Residential exits with clean ASN reputation are what actually pass. We build [JiBao Proxy](https://jibaoproxy.com) for exactly this: 72M+ residential IPs across 200+ countries, sticky sessions, and SOCKS5/HTTP gateways. The middleware works with any provider, though.

## Related

- [Scrapy proxy middleware: the complete guide](https://jibaoproxy.com/blog/scrapy-proxy-middleware-guide.html)
- [Why your JA3/TLS fingerprint gets you blocked](https://jibaoproxy.com/blog/ja3-tls-fingerprint-detection-explained.html)
- [Bypassing DataDome & PerimeterX in 2026](https://jibaoproxy.com/blog/datadome-perimeterx-bypass-2026.html)

## License

MIT
