Metadata-Version: 2.4
Name: webscraping_ai
Version: 4.0.0
Summary: Official Python client for the WebScraping.AI API.
Project-URL: Homepage, https://webscraping.ai
Project-URL: Documentation, https://webscraping.ai/docs
Project-URL: Source, https://github.com/webscraping-ai/webscraping-ai-python
Project-URL: Changelog, https://github.com/webscraping-ai/webscraping-ai-python/blob/master/CHANGELOG.md
Project-URL: Issues, https://github.com/webscraping-ai/webscraping-ai-python/issues
Author-email: "WebScraping.AI Support" <support@webscraping.ai>
License: MIT License
        
        Copyright (c) WebScraping.AI
        
        Permission is hereby granted, free of charge, to any person obtaining a copy
        of this software and associated documentation files (the "Software"), to deal
        in the Software without restriction, including without limitation the rights
        to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
        copies of the Software, and to permit persons to whom the Software is
        furnished to do so, subject to the following conditions:
        
        The above copyright notice and this permission notice shall be included in all
        copies or substantial portions of the Software.
        
        THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
        IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
        FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
        AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
        LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
        OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
        SOFTWARE.
License-File: LICENSE
Keywords: api,crawler,llm,scraping,webscraping
Classifier: Development Status :: 5 - Production/Stable
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.9
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.9
Requires-Dist: httpx<1.0,>=0.27
Description-Content-Type: text/markdown

# webscraping_ai

Official Python client for the [WebScraping.AI](https://webscraping.ai) API.

> **4.0 is a hard break from 3.x.** See [CHANGELOG.md](CHANGELOG.md) for the
> migration notes. If you cannot update your call sites yet, stay on
> `webscraping_ai == 3.2.1`.

## Install

```bash
pip install webscraping_ai
```

Requires Python 3.9 or newer.

## Quick start

```python
from webscraping_ai import Client

client = Client(api_key="YOUR_API_KEY")

# Page HTML
html = client.html("https://example.com")

# Visible text, optionally as a structured JSON response
text = client.text("https://example.com", text_format="json", return_links=True)

# CSS-selected HTML
heading = client.selected("https://example.com", selector="h1")
multiple = client.selected_multiple("https://example.com", selectors=["h1", "p"])

# LLM-powered helpers
answer = client.question("https://example.com", question="What is the page title?")
fields = client.fields(
    "https://example.com",
    fields={"title": "Main product title", "price": "Current product price"},
)

# Account quota
info = client.account()
```

The client is also a context manager, which closes the underlying connection
pool on exit:

```python
with Client(api_key="...") as client:
    client.html("https://example.com")
```

## Async usage

`AsyncClient` mirrors `Client` but uses `async def` methods backed by
`httpx.AsyncClient`:

```python
import asyncio
from webscraping_ai import AsyncClient

async def main():
    async with AsyncClient(api_key="YOUR_API_KEY") as client:
        html = await client.html("https://example.com")
        print(html)

asyncio.run(main())
```

## Error handling

Every non-2xx response is mapped to a typed exception so you can `except` on
the situation you actually care about rather than parsing status codes:

```python
from webscraping_ai import (
    Client,
    AuthenticationError,
    RateLimitError,
    PaymentRequiredError,
    APITimeoutError,
    APIConnectionError,
)

client = Client(api_key="YOUR_API_KEY")

try:
    client.html("https://example.com")
except AuthenticationError:
    ...  # 403 — wrong or missing API key
except PaymentRequiredError:
    ...  # 402 — out of credits
except RateLimitError:
    ...  # 429 — too many concurrent requests
except APITimeoutError:
    ...  # request did not complete in time
except APIConnectionError:
    ...  # transport-level failure
```

All exceptions inherit from `WebScrapingAIError`, so you can catch everything
the client raises with a single `except` if you prefer. API errors expose the
parsed error envelope (`message`, `status`, `status_code`, `status_message`,
`body`, `response_body`).

## Endpoint reference

| Method                          | HTTP route          | Returns                       |
| ------------------------------- | ------------------- | ----------------------------- |
| `client.html(...)`              | `GET /html`         | `str` (page HTML)             |
| `client.text(...)`              | `GET /text`         | `str` or `dict` (JSON)        |
| `client.selected(...)`          | `GET /selected`     | `str`                         |
| `client.selected_multiple(...)` | `GET /selected-multiple` | `list`                   |
| `client.question(...)`          | `GET /ai/question`  | `str`                         |
| `client.fields(...)`            | `GET /ai/fields`    | `dict` (wrapped under `result`) |
| `client.account()`              | `GET /account`      | `dict`                        |

Every page-fetch method accepts the full set of API parameters as keyword
arguments: `headers`, `timeout`, `js`, `js_timeout`, `wait_for`, `proxy`,
`country`, `custom_proxy`, `device`, `error_on_404`, `error_on_redirect`,
`js_script`, plus the per-endpoint extras (`return_script_result`, `format`,
`text_format`, `return_links`, `selector`, `selectors`, `question`, `fields`).
See the [API documentation](https://webscraping.ai/docs) for the full
parameter reference.

### API response-shape notes

Two endpoints return shapes that differ from the OpenAPI spec examples. The
client returns the raw response unchanged, so:

- `/ai/fields` wraps the extracted fields under a `result` key:
  `{"result": {"title": "...", "price": "..."}}`.
- `/selected-multiple` returns `list[list[str]]`, not a flat `list[str]`.

## Development

```bash
mise install                    # or use python 3.13 from any source
python -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"
pytest
ruff check .
mypy src/webscraping_ai
```

## License

[MIT](LICENSE).
