Metadata-Version: 2.4
Name: geonode-scraper-sdk
Version: 0.1.0
Summary: Python SDK for the Geonode Scraper API
Author: Geonode Team
License-Expression: MIT
Keywords: OpenAPI,OpenAPI-Generator,Scraper API
Classifier: Programming Language :: Python :: 3.10
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Typing :: Typed
Requires-Python: >=3.10
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: urllib3<3.0.0,>=2.1.0
Requires-Dist: python-dateutil>=2.8.2
Requires-Dist: pydantic>=2.11
Requires-Dist: typing-extensions>=4.7.1
Provides-Extra: dev
Requires-Dist: pytest>=7.2.1; extra == "dev"
Requires-Dist: pytest-cov>=2.8.1; extra == "dev"
Requires-Dist: mypy>=1.5; extra == "dev"
Requires-Dist: types-python-dateutil>=2.8.19.14; extra == "dev"
Requires-Dist: ruff>=0.12.11; extra == "dev"
Dynamic: license-file

# Geonode Scraper SDK

Python SDK for the Geonode Scraper API. It supports synchronous and asynchronous
content extraction, job polling, usage statistics, and service health checks.

## Requirements

- Python 3.10+

## Installation

```sh
pip install geonode-scraper-sdk
```

## Configuration And Authentication

Create a client configuration with your API base URL and API key.

```python
from geonode_scraper_sdk import Configuration

configuration = Configuration(
    host="https://api.example.com",
    api_key={"APIKeyHeader": "your-api-key"},
)
```

If you do not set `host`, the generated client defaults to `http://localhost`.
You normally do not need `api_key_prefix` for this API.

## Quick Start

This example performs a synchronous extraction and prints the markdown result.

```python
from geonode_scraper_sdk import (
    ApiClient,
    ApiException,
    Configuration,
    ExtractRequest,
    ExtractionApi,
    OutputFormat,
    ProcessingMode,
)

configuration = Configuration(
    host="https://api.example.com",
    api_key={"APIKeyHeader": "your-api-key"},
)

with ApiClient(configuration) as api_client:
    api = ExtractionApi(api_client)

    try:
        response = api.extract_v1_extract_post(
            ExtractRequest(
                url="https://example.com",
                formats=[OutputFormat.MARKDOWN],
                processing_mode=ProcessingMode.SYNC,
            )
        )
        print(response.data.markdown)
        print(response.tokens_charged)
    except ApiException as exc:
        print(exc.status)
        print(exc.body)
```

## Async Workflow

When `processing_mode=ProcessingMode.ASYNC`, the extract call returns an async
job response with a job ID and status URL.

```python
from geonode_scraper_sdk import ApiClient, Configuration, ExtractRequest, ExtractionApi, ProcessingMode

configuration = Configuration(
    host="https://api.example.com",
    api_key={"APIKeyHeader": "your-api-key"},
)

with ApiClient(configuration) as api_client:
    api = ExtractionApi(api_client)

    submit = api.extract_v1_extract_post(
        ExtractRequest(
            url="https://example.com",
            processing_mode=ProcessingMode.ASYNC,
        )
    )

    job = api.get_job_result_v1_extract_job_id_get(submit.job_id)
    print(job.status)
    if job.data and job.data.markdown:
        print(job.data.markdown)
```

Use `get_job_result_v1_extract_job_id_get(job_id)` to poll a single job, or
`list_jobs_v1_extract_jobs_get(...)` to inspect and filter job history.

## Error Handling

Non-2xx responses raise `ApiException` or one of its subclasses.
The exception includes the HTTP status, response body, and any deserialized
error model in `exc.data`.

```python
from geonode_scraper_sdk import ApiClient, ApiException, Configuration, ExtractionApi, ExtractRequest

configuration = Configuration(
    host="https://api.example.com",
    api_key={"APIKeyHeader": "your-api-key"},
)

with ApiClient(configuration) as api_client:
    api = ExtractionApi(api_client)

    try:
        api.extract_v1_extract_post(ExtractRequest(url="https://example.com"))
    except ApiException as exc:
        print(exc.status)
        print(exc.body)
        print(exc.data)
```

## Request Options

`ExtractRequest` supports the main extraction controls:

- `formats`: output formats to return; defaults to `[OutputFormat.HTML]`
- `render_js`: use a headless browser for JavaScript-rendered pages; defaults to `False`
- `processing_mode`: `ProcessingMode.SYNC` or `ProcessingMode.ASYNC`; defaults to sync
- `proxy`: optional `ProxySettings` for country and proxy type selection
- `headers`: optional request headers dictionary

Example with additional options:

```python
from geonode_scraper_sdk import ExtractRequest, OutputFormat, ProcessingMode, ProxySettings, ProxyType

request = ExtractRequest(
    url="https://example.com",
    formats=[OutputFormat.HTML, OutputFormat.MARKDOWN],
    render_js=True,
    processing_mode=ProcessingMode.SYNC,
    proxy=ProxySettings(country="US", type=ProxyType.RESIDENTIAL),
    headers={"User-Agent": "geonode-scraper-sdk-demo"},
)
```

## API Reference

- `ExtractionApi.extract_v1_extract_post(extract_request)`
- `ExtractionApi.get_job_result_v1_extract_job_id_get(job_id)`
- `ExtractionApi.list_jobs_v1_extract_jobs_get(job_id=None, url=None, status=None, output=None, start_date=None, end_date=None, page=None, page_size=None)`
- `StatisticsApi.get_statistics_v1_statistics_get(start_date=None, end_date=None)`
- `SystemApi.health_check_health_get()`

## Advanced Usage

Each generated API method also exposes:

- `*_with_http_info()` to get the deserialized payload together with status and headers
- `*_without_preload_content()` to work with the raw HTTP response directly
