Metadata-Version: 2.4
Name: scrapi-sdk
Version: 1.0.1
Summary: Official Python SDK for the ScrAPI web scraping service.
Author: DevEnterprise Software
License-Expression: MIT
Project-URL: Homepage, https://scrapi.tech
Project-URL: Documentation, https://scrapi.tech/docs
Project-URL: Repository, https://github.com/DevEnterpriseSoftware/scrapi-sdk-python
Project-URL: Issues, https://github.com/DevEnterpriseSoftware/scrapi-sdk-python/issues
Keywords: scraping,scraper,crawler,extractor,api,sdk
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: httpx<1.0,>=0.27
Provides-Extra: html
Requires-Dist: beautifulsoup4<5.0,>=4.12; extra == "html"
Provides-Extra: dev
Requires-Dist: build>=1.2; extra == "dev"
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: pytest-asyncio>=0.23; extra == "dev"
Requires-Dist: twine>=5.1; extra == "dev"
Dynamic: license-file

![ScrAPI logo](https://raw.githubusercontent.com/DevEnterpriseSoftware/scrapi-sdk-dotnet/master/icon_small.png)

# ScrAPI SDK for Python

[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)
![PyPI](https://img.shields.io/pypi/dm/scrapi-sdk)


ScrAPI is your ultimate web scraping solution, offering powerful, reliable, and easy-to-use features to extract data from any website effortlessly.

Official Python SDK for the [ScrAPI](https://scrapi.tech) web scraping service.

- Website: https://scrapi.tech
- API docs: https://scrapi.tech/docs
- Source repository: https://github.com/DevEnterpriseSoftware/scrapi-sdk-python

## Table of contents

- [Installation](#installation)
- [Quick start (sync)](#quick-start-sync)
- [Quick start (async)](#quick-start-async)
- [Scrape request options](#scrape-request-options)
- [Browser commands](#browser-commands)
- [Scrape response data](#scrape-response-data)
- [Scrape request defaults](#scrape-request-defaults)
- [Lookups](#lookups)
- [Exceptions](#exceptions)
- [HTML helper utilities (optional)](#html-helper-utilities-optional)
- [Sample app](#sample-app)
- [Development](#development)
- [Build and publish](#build-and-publish)

## Installation

```bash
pip install scrapi-sdk
```

Install optional HTML helpers:

```bash
pip install "scrapi-sdk[html]"
```

## Quick start (sync)

```python
from scrapi_sdk import ScrapeRequest, ScrapiClient

with ScrapiClient("YOUR_API_KEY") as client:
    response = client.scrape(ScrapeRequest("https://deventerprise.com"))
    print(response.content if response else "No response")
```

## Quick start (async)

```python
import asyncio
from scrapi_sdk import AsyncScrapiClient


async def main() -> None:
    async with AsyncScrapiClient("YOUR_API_KEY") as client:
        response = await client.scrape("https://deventerprise.com")
        print(response.content if response else "No response")


asyncio.run(main())
```

## Scrape request options

All options map to ScrAPI API fields while exposing Pythonic snake_case names.

| Python field | Type | Description |
|---|---|---|
| `url` | `str` | URL to scrape. Relative inputs are normalized to `https://...`. |
| `response_format` | `ResponseFormat` | Must be `ResponseFormat.JSON` when using this SDK client. |
| `response_selector` | `str \| None` | CSS/XPath selector for response filtering. |
| `cookies` | `dict[str, str]` | Cookies sent to target request. |
| `headers` | `dict[str, str]` | Headers sent to target request. |
| `request_method` | `str` | HTTP method override (default `GET`). |
| `request_body_base64` | `str \| None` | Base64 request payload. |
| `proxy_type` | `ProxyType` | `NONE`, `FREE`, `RESIDENTIAL`, `DATACENTER`, `TOR`, `CUSTOM`. |
| `proxy_country` | `str \| None` | Three-letter country code, e.g. `USA`. |
| `proxy_city` | `str \| None` | City key (requires `proxy_country`). |
| `custom_proxy_url` | `str \| None` | Custom proxy URL. |
| `use_browser` | `bool` | Enable browser mode. |
| `solve_captchas` | `bool` | Auto solve captchas (browser mode only). |
| `include_screenshot` | `bool` | Include screenshot URL in response (browser mode only). |
| `include_pdf` | `bool` | Include PDF URL in response (browser mode only). |
| `include_video` | `bool` | Include video URL in response (browser mode only). |
| `accept_dialogs` | `bool` | Accept browser dialogs/popups. |
| `session_id` | `str \| None` | Reuse session context across calls. |
| `callback_url` | `str \| None` | Webhook URL called when scrape completes. |
| `browser_commands` | `BrowserCommandList` | Ordered browser action commands. |

Example:

```python
from scrapi_sdk import ProxyType, ResponseFormat, ScrapeRequest

request = ScrapeRequest("https://deventerprise.com")
request.proxy_type = ProxyType.RESIDENTIAL
request.proxy_country = "USA"
request.use_browser = True
request.solve_captchas = True
request.include_screenshot = True
request.response_format = ResponseFormat.JSON
```

## Browser commands

When `use_browser=True`, chain browser commands with `BrowserCommandList`:

```python
from scrapi_sdk import ScrapeRequest

request = ScrapeRequest("https://www.roboform.com/filling-test-all-fields")
request.use_browser = True
request.accept_dialogs = True

request.browser_commands \
    .input("input[name='01___title']", "Mr") \
    .input("input[name='02frstname']", "Werner") \
    .input("input[name='04lastname']", "van Deventer") \
    .select("select[name='40cc__type']", "Discover") \
    .wait(3000) \
    .wait_for("input[type='reset']") \
    .click("input[type='reset']") \
    .wait(1000) \
    .scroll(1000) \
    .evaluate("console.log('any valid code...')")
```

## Scrape response data

`ScrapeResponse` includes all API response details.

```python
response = client.scrape("https://deventerprise.com")

if response:
    print(response.request_url)
    print(response.response_url)
    print(response.duration)
    print(response.attempts)
    print(response.credits_used)
    print(response.status_code)
    print(response.screenshot_url)
    print(response.pdf_url)
    print(response.video_url)
    print(response.content)
    print(response.content_hash)  # SHA1 of UTF-16LE content to match .NET SDK parity.

    for captcha_name, solved_count in response.captchas_solved.items():
        print(f"{captcha_name}: {solved_count}")

    for key, value in response.headers.items():
        print(f"{key}: {value}")

    for key, value in response.cookies.items():
        print(f"{key}: {value}")

    for message in response.error_messages or []:
        print(message)
```

If `beautifulsoup4` is installed, `response.html` returns a parsed `BeautifulSoup` object.

## Scrape request defaults

`ScrapeRequestDefaults` applies defaults to every new `ScrapeRequest`.

```python
from scrapi_sdk import ProxyType, ScrapeRequest, ScrapeRequestDefaults

ScrapeRequestDefaults.proxy_type = ProxyType.RESIDENTIAL
ScrapeRequestDefaults.use_browser = True
ScrapeRequestDefaults.solve_captchas = True
ScrapeRequestDefaults.headers["Sample"] = "Custom-Value"

request = ScrapeRequest("https://deventerprise.com")
request.proxy_type = ProxyType.TOR  # explicit override

assert request.proxy_type == ProxyType.TOR
assert request.use_browser is True
assert request.solve_captchas is True
assert request.headers["Sample"] == "Custom-Value"
```

## Lookups

### Credit balance

```python
balance = client.get_credit_balance()
print(balance)
```

### Supported countries

```python
countries = client.get_supported_countries()
for country in countries:
    print(country.key, country.name, country.proxy_count)
```

### Supported cities

```python
cities = client.get_supported_cities("USA")
for city in cities:
    print(city.key, city.name, city.proxy_count)
```

## Exceptions

Any client/API errors are raised as `ScrapiException` with HTTP status code details.

```python
from scrapi_sdk import ScrapeRequest, ScrapiClient, ScrapiException

with ScrapiClient("YOUR_API_KEY") as client:
    try:
        response = client.scrape(ScrapeRequest("https://deventerprise.com"))
    except ScrapiException as ex:
        print(f"Error ({ex.status_code}): {ex}")
        raise
```

## HTML helper utilities (optional)

Install optional dependency first:

```bash
pip install "scrapi-sdk[html]"
```

Helpers exported from `scrapi_sdk`:

- `numbers_only(text, include_decimal_points=False, trim=True)`
- `html_with_no_script(html)`
- `next_element(node)`
- `is_visible(node, check_parent_nodes=True)`

Example:

```python
from scrapi_sdk import html_with_no_script, numbers_only

print(numbers_only("USD 1,299.95", include_decimal_points=True))
print(html_with_no_script("<p>safe</p><script>alert(1)</script>"))
```

## Sample app

A runnable sample app is included at [`examples/basic_scrape/main.py`](examples/basic_scrape/main.py).

It reads `SCRAPI_API_KEY` and scrapes `https://deventerprise.com`.

## Development

```bash
python -m venv .venv
. .venv/Scripts/activate  # Windows PowerShell: .venv\Scripts\Activate.ps1
pip install -e .[dev,html]
pytest
```

## Build and publish

### Local build

```bash
python -m pip install --upgrade pip build twine
python -m build
python -m twine check dist/*
```

### Upload to TestPyPI

```bash
# PowerShell
$env:TWINE_USERNAME="__token__"
$env:TWINE_PASSWORD="pypi-..."
python -m twine upload -r testpypi dist/*
```

### Upload to PyPI

```bash
# PowerShell
$env:TWINE_USERNAME="__token__"
$env:TWINE_PASSWORD="pypi-..."
python -m twine upload dist/*
```
