Metadata-Version: 2.1
Name: webscraping-ai
Version: 3.2.1
Summary: WebScraping.AI
Home-page: https://webscraping.ai
Author: WebScraping.AI Support
Author-email: "WebScraping.AI Support" <support@webscraping.ai>
Project-URL: Repository, https://github.com/webscraping-ai/webscraping-ai-python
Keywords: OpenAPI,OpenAPI-Generator,WebScraping.AI
Requires-Python: >=3.9
Description-Content-Type: text/markdown
Requires-Dist: urllib3 <3.0.0,>=2.1.0
Requires-Dist: python-dateutil >=2.8.2
Requires-Dist: pydantic >=2.11
Requires-Dist: typing-extensions >=4.7.1

# webscraping-ai
WebScraping.AI scraping API provides LLM-powered tools with Chromium JavaScript rendering, rotating proxies, and built-in HTML parsing.

This Python package is automatically generated by the [OpenAPI Generator](https://openapi-generator.tech) project:

- API version: 3.2.1
- Package version: 3.2.1
- Generator version: 7.22.0
- Build package: org.openapitools.codegen.languages.PythonClientCodegen
For more information, please visit [https://webscraping.ai](https://webscraping.ai)

## Requirements.

Python 3.10+

## Installation & Usage
### pip install

If the python package is hosted on a repository, you can install directly using:

```sh
pip install git+https://github.com/webscraping-ai/webscraping-ai-python.git
```
(you may need to run `pip` with root permission: `sudo pip install git+https://github.com/webscraping-ai/webscraping-ai-python.git`)

Then import the package:
```python
import webscraping_ai
```

### Setuptools

Install via [Setuptools](http://pypi.python.org/pypi/setuptools).

```sh
python setup.py install --user
```
(or `sudo python setup.py install` to install the package for all users)

Then import the package:
```python
import webscraping_ai
```

### Tests

Execute `pytest` to run the tests.

## Getting Started

Please follow the [installation procedure](#installation--usage) and then run the following:

```python

import webscraping_ai
from webscraping_ai.rest import ApiException
from pprint import pprint

# Defining the host is optional and defaults to https://api.webscraping.ai
# See configuration.py for a list of all supported configuration parameters.
configuration = webscraping_ai.Configuration(
    host = "https://api.webscraping.ai"
)

# The client must configure the authentication and authorization parameters
# in accordance with the API server security policy.
# Examples for each auth method are provided below, use the example that
# satisfies your auth use case.

# Configure API key authorization: api_key
configuration.api_key['api_key'] = os.environ["API_KEY"]

# Uncomment below to setup prefix (e.g. Bearer) for API key, if needed
# configuration.api_key_prefix['api_key'] = 'Bearer'


# Enter a context with an instance of the API client
with webscraping_ai.ApiClient(configuration) as api_client:
    # Create an instance of the API class
    api_instance = webscraping_ai.AIApi(api_client)
    url = 'https://example.com' # str | URL of the target page.
    fields = {'key': '{\"title\":\"Main product title\",\"price\":\"Current product price\",\"description\":\"Full product description\"}'} # Dict[str, str] | Object describing fields to extract from the page and their descriptions
    headers = {'key': '{\"Cookie\":\"session=some_id\"}'} # Dict[str, str] | HTTP headers to pass to the target page. Can be specified either via a nested query parameter (...&headers[One]=value1&headers=[Another]=value2) or as a JSON encoded object (...&headers={\"One\": \"value1\", \"Another\": \"value2\"}). (optional)
    timeout = 10000 # int | Maximum web page retrieval time in ms. Increase it in case of timeout errors (10000 by default, maximum is 30000). (optional) (default to 10000)
    js = True # bool | Execute on-page JavaScript using a headless browser (true by default). (optional) (default to True)
    js_timeout = 2000 # int | Maximum JavaScript rendering time in ms. Increase it in case if you see a loading indicator instead of data on the target page. (optional) (default to 2000)
    wait_for = 'wait_for_example' # str | CSS selector to wait for before returning the page content. Useful for pages with dynamic content loading. Overrides js_timeout. (optional)
    proxy = datacenter # str | Type of proxy. Use `residential` if your site restricts traffic from datacenters, or `stealth` for the most heavily protected sites with advanced anti-bot detection (`datacenter` by default). Residential and stealth proxy requests are more expensive than datacenter, see the pricing page for details. (optional) (default to datacenter)
    country = us # str | Country of the proxy to use (US by default). (optional) (default to us)
    custom_proxy = 'custom_proxy_example' # str | Your own proxy URL to use instead of our built-in proxy pool in \"http://user:password@host:port\" format (<a target=\"_blank\" href=\"https://webscraping.ai/proxies/smartproxy\">Smartproxy</a> for example). (optional)
    device = desktop # str | Type of device emulation. (optional) (default to desktop)
    error_on_404 = False # bool | Return error on 404 HTTP status on the target page (false by default). (optional) (default to False)
    error_on_redirect = False # bool | Return error on redirect on the target page (false by default). (optional) (default to False)
    js_script = 'document.querySelector(\'button\').click();' # str | Custom JavaScript code to execute on the target page. (optional)

    try:
        # Extract structured data fields from a web page
        api_response = api_instance.get_fields(url, fields, headers=headers, timeout=timeout, js=js, js_timeout=js_timeout, wait_for=wait_for, proxy=proxy, country=country, custom_proxy=custom_proxy, device=device, error_on_404=error_on_404, error_on_redirect=error_on_redirect, js_script=js_script)
        print("The response of AIApi->get_fields:\n")
        pprint(api_response)
    except ApiException as e:
        print("Exception when calling AIApi->get_fields: %s\n" % e)

```

## Documentation for API Endpoints

All URIs are relative to *https://api.webscraping.ai*

Class | Method | HTTP request | Description
------------ | ------------- | ------------- | -------------
*AIApi* | [**get_fields**](docs/AIApi.md#get_fields) | **GET** /ai/fields | Extract structured data fields from a web page
*AIApi* | [**get_question**](docs/AIApi.md#get_question) | **GET** /ai/question | Get an answer to a question about a given web page
*AccountApi* | [**account**](docs/AccountApi.md#account) | **GET** /account | Information about your account calls quota
*HTMLApi* | [**get_html**](docs/HTMLApi.md#get_html) | **GET** /html | Page HTML by URL
*SelectedHTMLApi* | [**get_selected**](docs/SelectedHTMLApi.md#get_selected) | **GET** /selected | HTML of a selected page area by URL and CSS selector
*SelectedHTMLApi* | [**get_selected_multiple**](docs/SelectedHTMLApi.md#get_selected_multiple) | **GET** /selected-multiple | HTML of multiple page areas by URL and CSS selectors
*TextApi* | [**get_text**](docs/TextApi.md#get_text) | **GET** /text | Page text by URL


## Documentation For Models

 - [Account](docs/Account.md)
 - [Error](docs/Error.md)


<a id="documentation-for-authorization"></a>
## Documentation For Authorization


Authentication schemes defined for the API:
<a id="api_key"></a>
### api_key

- **Type**: API key
- **API key parameter name**: api_key
- **Location**: URL query string


## Author

support@webscraping.ai


