Metadata-Version: 2.4
Name: scrapy-calyprium
Version: 1.12.0
Summary: Anti-detection Scrapy middleware — proxy routing and browser rendering for web scraping
Author-email: Calyprium <hello@calyprium.com>
License: MIT
Project-URL: Homepage, https://calyprium.com
Project-URL: Documentation, https://docs.calyprium.com
Project-URL: Repository, https://github.com/Aarkc/scrapy-calyprium
Keywords: scrapy,anti-detection,proxy,browser,tls-fingerprint,web-scraping
Classifier: Development Status :: 4 - Beta
Classifier: Framework :: Scrapy
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: scrapy>=2.11.0
Requires-Dist: httpx>=0.27.0
Requires-Dist: boto3>=1.34.0
Provides-Extra: local
Requires-Dist: httpcloak>=0.4.0; platform_system == "Linux" and extra == "local"
Requires-Dist: curl_cffi>=0.7.0; extra == "local"
Provides-Extra: dev
Requires-Dist: pytest; extra == "dev"
Requires-Dist: pytest-asyncio; extra == "dev"
Requires-Dist: responses; extra == "dev"
Dynamic: license-file

# scrapy-calyprium

Anti-detection [Scrapy](https://scrapy.org) middleware for web scraping — proxy routing and stealth browser rendering powered by [Calyprium](https://calyprium.com).

## Install

```bash
pip install scrapy-calyprium
```

## Quick Start

```python
# settings.py
import scrapy_calyprium

scrapy_calyprium.configure(api_key="clp_your_key_here")
```

This auto-configures:
- **VeilProxyMiddleware** — routes requests through rotating proxies with TLS fingerprinting
- **MimicBrowserMiddleware** — renders JavaScript pages with stealth browser instances
- **S3 feed storage** — write spider output to Calyprium storage using Scrapy's built-in `S3FeedStorage`

## Usage

### Automatic Configuration (recommended)

```python
# settings.py
import scrapy_calyprium

scrapy_calyprium.configure(
    api_key="clp_your_key_here",
    mimic_stealth_level="maximum",  # basic, moderate, maximum
)
```

### Manual Configuration

```python
# settings.py
DOWNLOADER_MIDDLEWARES = {
    "scrapy_calyprium.VeilProxyMiddleware": 100,
    "scrapy_calyprium.MimicBrowserMiddleware": 200,
}

CALYPRIUM_API_KEY = "clp_your_key_here"
VEIL_USER_ID = "your-user-id"
```

### Saving Output to Calyprium Storage

Spider output is saved to Calyprium's S3-compatible storage using Scrapy's built-in feed export:

```python
# settings.py
import scrapy_calyprium

scrapy_calyprium.configure(api_key="clp_your_key_here")

FEEDS = {
    "s3://calyprium/my-spider/%(time)s.jl": {
        "format": "jsonlines",
    },
}
```

The S3 credentials are auto-configured by `configure()` — no additional setup needed.

### Browser Rendering

Mark requests that need JavaScript rendering:

```python
import scrapy

class MySpider(scrapy.Spider):
    name = "example"

    def start_requests(self):
        # Regular request (proxy only)
        yield scrapy.Request("https://example.com")

        # Browser-rendered request
        yield scrapy.Request(
            "https://example.com/spa",
            meta={"mimic": True},
        )
```

## Authentication

All middleware requires a valid API key. Set it via:

1. `scrapy_calyprium.configure(api_key="clp_...")`
2. `CALYPRIUM_API_KEY` environment variable

## Settings Reference

| Setting | Description | Default |
|---------|-------------|---------|
| `CALYPRIUM_API_KEY` | API key for all services | — |
| `VEIL_GATEWAY_URL` | Proxy gateway URL | `https://proxy.calyprium.com` |
| `VEIL_USER_ID` | User ID for proxy routing | — |
| `VEIL_PROFILE` | Proxy routing profile | — |
| `VEIL_PROXY_TYPE` | `datacenter`, `residential`, `residential_rotating` | — |
| `MIMIC_SERVICE_URL` | Mimic browser service URL | `https://mimic.calyprium.com` |
| `MIMIC_STEALTH_LEVEL` | `basic`, `moderate`, `maximum` | `moderate` |
| `MIMIC_BROWSER_ENGINE` | Specific browser engine | auto |
| `MIMIC_USE_PROXY` | Route browser through proxy | `False` |
| `MIMIC_ALL_REQUESTS` | Render all requests via browser | `False` |
| `MIMIC_USE_SPECTRE` | Use device fingerprints | `True` |

## License

MIT
