Metadata-Version: 2.4
Name: abrasio
Version: 0.1.8
Summary: Stealth web scraping SDK with optional cloud browser support
Project-URL: Homepage, https://github.com/Scrape-Technology/abrasio-sdk
Project-URL: Documentation, https://scrapetechnology.com/abrasio
Project-URL: Repository, https://github.com/Scrape-Technology/abrasio-sdk
Author-email: Scrape Technology <joao.sobhie@scrapetechnology.com>
License-Expression: MIT
License-File: LICENSE
Keywords: anti-detection,browser-automation,curl-cffi,fingerprint,ja3,patchright,scraping,stealth,tls-fingerprint,undetected,web-scraping
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.8
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Topic :: Internet :: WWW/HTTP :: Browsers
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Requires-Python: >=3.8
Requires-Dist: httpx>=0.26.0
Requires-Dist: patchright>=1.58.0
Provides-Extra: all
Requires-Dist: browserforge>=1.2.4; extra == 'all'
Requires-Dist: cryptography>=41.0.0; extra == 'all'
Requires-Dist: curl-cffi>=0.15.0; extra == 'all'
Provides-Extra: cert
Requires-Dist: cryptography>=41.0.0; extra == 'cert'
Provides-Extra: dev
Requires-Dist: black>=23.0.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.21.0; extra == 'dev'
Requires-Dist: pytest>=7.0.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Provides-Extra: fingerprint
Requires-Dist: browserforge>=1.2.4; extra == 'fingerprint'
Provides-Extra: tls
Requires-Dist: curl-cffi>=0.15.0; extra == 'tls'
Description-Content-Type: text/markdown

# Abrasio SDK

[![Python 3.8+](https://img.shields.io/badge/python-3.8+-blue.svg)](https://www.python.org/downloads/)
[![Version 0.1.6](https://img.shields.io/badge/version-0.1.6-blue.svg)]()

**Undetected web scraping SDK** inspired on [Patchright](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright) with human-like behavior simulation and optional cloud browser support.

## Features

| Feature | Description |
|---------|-------------|
| **Undetected** | Patchright bypasses Runtime.enable and CDP detection |
| **Free Local Mode** | Run on your machine with anti-detection patches |
| **Cloud Mode** | Real fingerprints(paid) |
| **Human Behavior** | Bezier mouse movements, natural typing, smooth scrolling |
| **TLS Fingerprinting** | curl_cffi for JA3/JA4 TLS fingerprint matching |
| **Fingerprint Config** | Control WebGL, WebRTC, canvas/audio noise per session |
| **Client Certificates** | TLS Client Auth for sites that require a client cert (e.g. ICP-Brasil/gov.br logins); works in cloud mode via request interception |
| **Playwright API** | Same API you already know |

## Anti-Detection Status

| Technique | Status | Notes |
|-----------|--------|-------|
| Patchright (CDP leak) | Implemented | Protocol-level, not JS patches |
| navigator.webdriver | Implemented | `--disable-blink-features=AutomationControlled` |
| Headless User-Agent fix | Implemented | Auto-removes "HeadlessChrome" in headless mode |
| Human behavior (Bezier) | Implemented | Mouse, typing, scrolling |
| TLS fingerprinting (JA3/JA4) | Implemented | via curl_cffi for HTTP requests |
| Canvas noise | Implemented | Optional via `FingerprintConfig(canvas_noise=True)` |
| Audio noise | Implemented | Optional via `FingerprintConfig(audio_noise=True)` |
| WebRTC IP leak protection | Implemented | Via `FingerprintConfig(webrtc=False)` |
| WebGL control | Implemented | Enabled by default, disable via config |
| Timezone/Locale consistency | Auto-validated | Auto-configures from region or IP |
| Retry on rate limit | Implemented | Exponential backoff with Retry-After |

## Table of Contents

- [Installation](#installation)
- [Quick Start](#quick-start)
- [FingerprintConfig](#fingerprintconfig)
- [Human-Like Behavior](#human-like-behavior)
- [TLS Fingerprinting (HTTP)](#tls-fingerprinting-http)
- [Configuration](#configuration)
- [Client Certificates](#client-certificates)
- [Cloud Mode](#cloud-mode-paid)
- [Error Handling](#error-handling)
- [API Reference](#api-reference)
- [Best Practices](#best-practices)

## Installation

```bash
# Install SDK
pip install abrasio

# Install real Chrome (NOT Chromium) for maximum stealth
patchright install chrome

# Optional: TLS fingerprinting for HTTP requests
pip install abrasio[tls]

# Optional: fingerprint generation utilities
pip install abrasio[fingerprint]

# Optional: client certificates (PFX/PKCS12 -> PEM conversion)
pip install abrasio[cert]

# Install everything
pip install abrasio[all]
```

### Requirements

- Python 3.8+
- Chrome browser (installed via `patchright install chrome`)

## Quick Start

### Local Mode (Free)

```python
import asyncio
from abrasio import Abrasio

async def main():
    async with Abrasio(headless=False) as browser:
        page = await browser.new_page()
        await page.goto("https://example.com")
        print(await page.title())

asyncio.run(main())
```

### Cloud Mode (Paid)

```python
import asyncio
from abrasio import Abrasio

async def main():
    async with Abrasio(
        api_key="sk_live_xxx",
        region="BR",
        url="https://example.com.br",
    ) as browser:
        page = await browser.new_page()
        await page.goto("https://example.com.br")
        print(await page.title())

asyncio.run(main())
```

### Synchronous API

```python
from abrasio.sync_api import Abrasio

with Abrasio(headless=False) as browser:
    page = browser.new_page()
    page.goto("https://example.com")
    print(page.title())
```

## FingerprintConfig

Control browser fingerprint protections in **local mode only**. In cloud mode, the cloud browser handles all fingerprinting automatically.

```python
from abrasio import Abrasio, FingerprintConfig

async with Abrasio(
    headless=False,
    fingerprint=FingerprintConfig(
        webgl=True,          # Keep WebGL enabled (default). False blocks it.
        webrtc=False,        # Block WebRTC IP leak (recommended with proxy)
        canvas_noise=True,   # Add noise to canvas fingerprint
        audio_noise=True,    # Add noise to audio fingerprint
    ),
) as browser:
    page = await browser.new_page()
    await page.goto("https://example.com")
```

| Option | Default | Description |
|--------|---------|-------------|
| `webgl` | `True` | Enable WebGL APIs. Disabling is a strong bot signal. |
| `webrtc` | `True` | Enable WebRTC. Set `False` with proxy to prevent real IP leak. |
| `canvas_noise` | `False` | Add imperceptible noise to canvas reads. Randomizes fingerprint. |
| `audio_noise` | `False` | Add noise to AudioContext reads. Randomizes fingerprint. |

> **Cloud mode**: `FingerprintConfig` is completely ignored. The cloud browser uses real collected fingerprints 

## Human-Like Behavior

Utilities for simulating realistic human behavior to bypass behavioral analysis.

### Mouse Movement (Bezier Curves)

```python
from abrasio.utils import human_move_to, human_click

# Move mouse with natural Bezier curve trajectory
await human_move_to(page, x=500, y=300)

# Click with natural movement and random offset
await human_click(page, "button#submit")
```

### Natural Typing

```python
from abrasio.utils import human_type

await human_type(
    page,
    "Hello, World!",
    selector="input#search",
    mistake_probability=0.02,
    think_pause_probability=0.05,
)
```

### Smooth Scrolling

```python
from abrasio.utils import human_scroll, simulate_reading

await human_scroll(page, "down", amount=400, smooth=True)
await simulate_reading(page, min_seconds=3, max_seconds=8)
```

### All Utilities

```python
from abrasio.utils import (
    human_move_to,      # Bezier curve mouse movement
    human_click,        # Natural click with movement
    human_type,         # Variable-speed typing with mistakes
    human_scroll,       # Smooth scrolling with momentum
    human_wait,         # Random wait (skewed distribution)
    random_delay,       # Simple random delay
    simulate_reading,   # Simulate page reading behavior
)
```

## TLS Fingerprinting (HTTP)

For HTTP requests **outside the browser**, use `StealthClient` which matches real browser TLS fingerprints via [curl_cffi](https://github.com/yifeikong/curl_cffi).

```bash
pip install abrasio[tls]
```

```python
from abrasio.http import StealthClient

# Async
async with StealthClient() as client:
    response = await client.get("https://example.com")
    print(response.text)

# With region (auto-sets Accept-Language)
async with StealthClient(region="BR") as client:
    response = await client.get("https://example.com.br")

# With proxy
async with StealthClient(proxy="http://user:pass@host:8080") as client:
    response = await client.get("https://example.com")

# Rotate browser version on each request
async with StealthClient(rotate_impersonation=True) as client:
    for url in urls:
        response = await client.get(url)
```

## Configuration

### AbrasioConfig

```python
from abrasio import Abrasio, AbrasioConfig, FingerprintConfig

config = AbrasioConfig(
    # Mode: None = local (free), "sk_xxx" = cloud (paid)
    api_key=None,

    # Browser
    headless=False,                  # Visible = more stealthy
    proxy="http://user:pass@host:8080",
    timeout=30000,

    # Region (auto-configures locale/timezone)
    region="BR",

    # Fingerprint (local mode only)
    fingerprint=FingerprintConfig(
        webgl=True,
        webrtc=False,
        canvas_noise=True,
        audio_noise=True,
    ),

    # Profile persistence
    user_data_dir="./my_profile",

    # Cloud mode
    profile_id="my-profile",

    # Client certificates - local mode only (see Client Certificates section)
    client_certificates=None,

    # Advanced
    extra_args=[],
    debug=False,
)

async with Abrasio(config) as browser:
    ...
```

### Region Auto-Configuration

```python
config = AbrasioConfig(region="BR")
# locale="pt-BR", timezone="America/Sao_Paulo"

config = AbrasioConfig(region="JP")
# locale="ja-JP", timezone="Asia/Tokyo"
```

50+ regions supported. If you set a mismatched timezone, you'll get a warning:

```python
config = AbrasioConfig(region="BR", timezone="America/New_York")
print(config.region_warnings)
# ['Timezone mismatch: using America/New_York but region BR expects America/Sao_Paulo']
```

Without explicit region, locale/timezone are auto-detected from your public IP.

### Environment Variables

| Variable | Description | Default |
|----------|-------------|---------|
| `ABRASIO_API_KEY` | API key for cloud mode | `None` |
| `ABRASIO_API_URL` | API base URL | `https://abrasio.scrapetechnology.com/` |

## Client Certificates

Some sites require **TLS Client Authentication** during login — the browser must present a
client certificate during the TLS handshake (e.g. ICP-Brasil digital certificates used to log
into gov.br services). There are two ways to do this, and which one works depends on the mode:

```bash
pip install abrasio[cert]   # cryptography, only needed to convert PFX/PKCS12 -> PEM
```

```python
from abrasio import build_client_certificate

cert = build_client_certificate(
    origin="https://sso.acesso.gov.br",   # exact origin the cert is valid for
    pfx_path="certificado.pfx",           # or cert_path= / key_path= for a PEM pair
    passphrase="minha-senha",
)
```

### Local mode: native `client_certificates`

Playwright/Patchright support `client_certificates` as a context option, applied when the
browser launches. Pass it to `AbrasioConfig`/`Abrasio(...)`:

```python
async with Abrasio(headless=False, client_certificates=[cert]) as browser:
    page = await browser.new_page()
    await page.goto("https://sso.acesso.gov.br/login")
```

**This only works in local mode.** Under the hood, Playwright applies it via a local SOCKS proxy
that the browser dials back into — that requires the browser and the Playwright driver to be on
the same machine. In cloud mode the browser runs on Abrasio's infrastructure, so that proxy is
unreachable and the certificate is silently never used.

### Cloud mode (and local too): `route_with_certificate`

Intercept the specific certificate-login request and replay it outside the browser using
`httpx` (which supports client certificates natively), then feed the real response back into
the browser. Since the interception always runs in the driver process — never inside the
browser itself — this works regardless of where the browser runs:

```python
async with Abrasio(api_key="sk_live_xxx", region="BR") as browser:
    page = await browser.new_page()
    await page.goto("https://sso.acesso.gov.br/login")

    certificate_button = page.locator("#login-certificate")
    form_action = await certificate_button.get_attribute("formaction")

    await browser.route_with_certificate(page, form_action, cert)
    await certificate_button.click()
```

`route_with_certificate` defaults the replay request's proxy to the session's configured
`proxy`, so the authenticated request leaves through the same exit IP as the rest of the
browser session — important, since an IP mismatch between normal navigation and the
certificate-authenticated request is exactly the kind of signal sites use to flag a session.

It also defaults `timeout` to the session's configured `timeout` (`AbrasioConfig.timeout`,
30s by default) instead of httpx's own 5s default — going through a proxy to a government
auth server can easily take longer than that. If you see the route time out (the page ends up
on `chrome-error://chromewebdata/`), pass a larger `timeout=` explicitly:

```python
await browser.route_with_certificate(page, form_action, cert, timeout=60)
```

See `examples/certificado.py` for a full working example.

## Cloud Mode (Paid)

With an API key, you get access to the Abrasio cloud infrastructure:

| Feature | Description |
|---------|-------------|
| **Real Fingerprints** | collected device data |
| **Geo-Targeting** | Target specific countries/regions |
| **Persistent Profiles** | Maintain cookies and history across sessions |
| **Session Recording** | Playwright trace recording for debugging |
| **Live View** | Real-time browser streaming via noVNC |
| **Automatic Retry** | SDK retries on rate limit (429) with backoff |

```python
from abrasio import Abrasio

async with Abrasio(
    api_key="sk_live_xxx",
    region="BR",
    url="https://target-site.com.br",
    profile_id="my-profile",
) as browser:
    page = await browser.new_page()

    # Live view URL (if enabled on server)
    if browser.live_view_url:
        print(f"Watch live: {browser.live_view_url}")

    await page.goto("https://target-site.com.br")
    print(await page.title())
```

## Error Handling

```python
from abrasio import (
    Abrasio,
    AbrasioError,           # Base error
    AuthenticationError,    # Invalid API key (401)
    InsufficientFundsError, # Not enough balance (402)
    RateLimitError,         # Too many sessions (429) - auto-retried
    SessionError,           # Session creation/management error
    BrowserError,           # Browser operation error
    TimeoutError,           # Operation timeout
    BlockedError,           # Target site blocked request
)

try:
    async with Abrasio(api_key="sk_live_xxx") as browser:
        page = await browser.new_page()
        await page.goto("https://example.com")
except AuthenticationError:
    print("Invalid API key")
except InsufficientFundsError as e:
    print(f"Add funds. Balance: ${e.balance:.2f}")
except RateLimitError as e:
    print(f"Rate limited. Retry after {e.retry_after}s")
except SessionError as e:
    print(f"Session error: {e.message}")
except AbrasioError as e:
    print(f"Error: {e.message}")
```

> **Note**: The SDK automatically retries on 429 (rate limit), 502, 503, 504 with exponential backoff up to 3 times.

## API Reference

### Abrasio

```python
class Abrasio:
    def __init__(
        self,
        config: Optional[AbrasioConfig] = None,
        *,
        api_key: Optional[str] = None,
        headless: bool = True,
        proxy: Optional[str] = None,
        stealth: bool = True,
        **kwargs,
    ): ...

    async def start(self) -> "Abrasio": ...
    async def close(self) -> None: ...
    async def new_page(self) -> Page: ...
    async def new_context(self, **kwargs) -> BrowserContext: ...
    async def route_with_certificate(self, target, url, certificate, *, proxy=None, timeout=None) -> None: ...

    @property
    def browser(self): ...       # Browser (cloud) or BrowserContext (local)
    @property
    def is_cloud(self) -> bool: ...
    @property
    def is_local(self) -> bool: ...
    @property
    def live_view_url(self) -> Optional[str]: ...  # Cloud mode only
```

### Human Utilities

```python
async def human_move_to(page, x, y, *, min_time=0.1, max_time=1.5): ...
async def human_click(page, selector=None, *, offset_range=5, move_first=True): ...
async def human_type(page, text, selector=None, *, mistake_probability=0.02): ...
async def human_scroll(page, direction="down", amount=None, *, smooth=True): ...
async def random_delay(min_ms=100, max_ms=500): ...
async def human_wait(min_seconds=0.5, max_seconds=2.0): ...
async def simulate_reading(page, min_seconds=2.0, max_seconds=8.0): ...
```

### StealthClient (TLS)

```python
from abrasio.http import StealthClient, BrowserImpersonation

class StealthClient:
    def __init__(self, impersonate=BrowserImpersonation.DEFAULT,
                 proxy=None, region=None, rotate_impersonation=False): ...

    async def get(self, url, **kwargs) -> StealthResponse: ...
    async def post(self, url, **kwargs) -> StealthResponse: ...
```

## Best Practices

1. **Headless mode is safe** — SDK automatically removes "HeadlessChrome" from User-Agent
2. **Don't set `user_agent`** — let the SDK handle it (auto-fixed in headless mode)
3. **Don't set `viewport`** — uses `no_viewport` for realistic behavior
4. **Add human behavior** between actions (`human_wait`, `human_click`)
5. **Use persistent profiles** with `user_data_dir` for cookie persistence
6. **Use `region`** to auto-configure locale/timezone consistently
7. **Set `webrtc=False`** when using a proxy (prevents IP leak)
8. **Test against bot detection sites** before production deployment

## Testing Anti-Detection

```python
import asyncio
from abrasio import Abrasio

async def test():
    async with Abrasio(headless=False) as browser:
        page = await browser.new_page()

        await page.goto("https://bot.sannysoft.com/")
        await page.screenshot(path="sannysoft.png")

        await page.goto("https://abrahamjuliot.github.io/creepjs/")
        await page.wait_for_timeout(5000)
        await page.screenshot(path="creepjs.png")

asyncio.run(test())
```

## Project Structure

```
abrasio-sdk/
├── abrasio/
│   ├── __init__.py          # Public API exports
│   ├── _api.py              # Abrasio class (local + cloud)
│   ├── _config.py           # AbrasioConfig, FingerprintConfig
│   ├── _exceptions.py       # Exception hierarchy
│   ├── local/
│   │   └── browser.py       # StealthBrowser (Patchright)
│   ├── cloud/
│   │   ├── browser.py       # CloudBrowser (API + CDP)
│   │   └── api_client.py    # HTTP client with retry
│   ├── http/
│   │   └── client.py        # StealthClient (curl_cffi TLS)
│   ├── sync_api/
│   │   └── _sync.py         # Synchronous wrapper
│   └── utils/
│       ├── human.py         # Human behavior simulation
│       ├── fingerprint.py   # Region config, validation
│       ├── geolocation.py   # IP-based locale detection
│       └── certificates.py  # Client certificates (TLS Client Auth)
├── examples/
│   ├── basic_local.py       # Local mode example
│   ├── basic_cloud.py       # Cloud mode example
│   ├── human_behavior.py    # Human behavior demo
│   ├── fingerprint_check.py # Fingerprint validation
│   ├── tls_fingerprint.py   # TLS fingerprinting
│   └── certificado.py       # Client certificate login (gov.br)
├── docs/                    # Documentation
└── pyproject.toml
```

## References

- [Patchright](https://github.com/Kaliiiiiiiiii-Vinyzu/patchright) - Undetected Playwright fork
- [curl_cffi](https://github.com/yifeikong/curl_cffi) - TLS fingerprinting HTTP client
- [BrowserForge](https://github.com/daijro/browserforge) - Fingerprint generation
- [Ghost Cursor](https://github.com/Xetera/ghost-cursor) - Bezier curve mouse movements

## Support & Community

| Channel | Link |
|---------|------|
| 💬 Discord | [discord.gg/GBSKsC8DvS](https://discord.gg/GBSKsC8DvS) |
| 📧 Email | [joao.sobhie@scrapetechnology.com](mailto:joao.sobhie@scrapetechnology.com) |
| 🌐 Docs | [scrapetechnology.com/abrasio/docs](https://scrapetechnology.com/abrasio/docs) |

For bug reports and feature requests, open a thread in the `#abrasio-feedback` channel on Discord.

## License

Proprietary - Scrape Technology
