Metadata-Version: 2.4
Name: urlscope
Version: 0.1.3rc2
Summary: Python wrapper for the urlscan.io API.
Author: Jan Wychowaniak
License-Expression: MIT
License-File: LICENSE
Requires-Python: >=3.10
Requires-Dist: httpx
Requires-Dist: pydantic>=2
Provides-Extra: dev
Requires-Dist: mypy; extra == 'dev'
Requires-Dist: pytest; extra == 'dev'
Requires-Dist: pytest-asyncio; extra == 'dev'
Requires-Dist: respx; extra == 'dev'
Requires-Dist: ruff; extra == 'dev'
Description-Content-Type: text/markdown

# urlscope

`urlscope` is an async-first Python wrapper for the urlscan.io API. It provides typed Pydantic models for common API responses, automatic API key handling, built-in retry logic for rate limits, and a sync convenience wrapper for scripts.

## Installation

```bash
pip install urlscope
```

If you want to install the current prerelease from PyPI, use:

```bash
pip install --pre urlscope
```

Set your API key before making requests:

```bash
export URLSCAN_API_KEY="your-api-key-here"
```

## Quickstart

### Async submit and wait

```python
import asyncio
from urlscope import UrlscopeClient


async def main() -> None:
    async with UrlscopeClient() as client:
        result = await client.submit_and_wait(
            "https://example.com",
            visibility="public",
        )
        overall = result.verdicts.overall if result.verdicts else None
        print(result.task.uuid)
        print(result.page.url)
        print(overall.score if overall else None)


asyncio.run(main())
```

### Sync usage

```python
from urlscope import SyncClient


with SyncClient() as client:
    result = client.get_result("scan-uuid-here")
    print(result.page.url)
```

### Search

```python
import asyncio
from urlscope import UrlscopeClient


async def main() -> None:
    async with UrlscopeClient() as client:
        response = await client.search(
            "domain:example.com",
            size=10,
            datasource="scans",
        )
        print("total:", response.total, "took:", response.took)

        for item in response.results:
            page_url = item.page.get("url") if item.page else None
            print(item.id, page_url, item.result)

        # Cursor-based pagination is handled via the previous item's sort key.
        if response.has_more and response.results and response.results[-1].sort:
            next_page = await client.search(
                "domain:example.com",
                size=10,
                search_after=response.results[-1].sort,
                collapse="page.domain.keyword",
            )
            print(len(next_page.results))


asyncio.run(main())
```

### Download artifacts

```python
import asyncio
from urlscope import UrlscopeClient


async def main() -> None:
    async with UrlscopeClient() as client:
        screenshot = await client.get_screenshot("scan-uuid-here")
        dom = await client.get_dom("scan-uuid-here")
        print(len(screenshot), len(dom))


asyncio.run(main())
```

### Check quotas

```python
import asyncio
from urlscope import UrlscopeClient


async def main() -> None:
    async with UrlscopeClient() as client:
        quotas = await client.get_quotas()
        print(quotas.scope)
        for q in quotas.quotas[:5]:
            print(q.scope, q.action, q.window, q.used, q.remaining, q.limit, q.reset)


asyncio.run(main())
```

The live quotas response is also available in raw form via `QuotaInfo.limits`, including account metadata that is not flattened into `QuotaInfo.quotas`.

### Error handling

```python
import asyncio
from urlscope import RateLimitError, ScanTimeoutError, UrlscopeClient


async def main() -> None:
    async with UrlscopeClient() as client:
        try:
            await client.submit_and_wait("https://example.com", poll_timeout=120.0)
        except ScanTimeoutError as exc:
            print(exc.uuid)
        except RateLimitError as exc:
            print(exc.retry_after, exc.scope, exc.window)


asyncio.run(main())
```

`submit(..., override_safety=True)` is supported by the wrapper and is serialized to the current live urlscan wire format for `overrideSafety`.

## API Reference

Primary clients:

- `UrlscopeClient`: async interface for submit, result retrieval, polling, search, artifacts, and quotas
- `SyncClient`: sync wrapper with the same method surface for scripts and simple integrations

Key response models:

- `SubmissionResponse`
- `ScanResult`, `TaskInfo`, `PageInfo`, `Verdicts`, `BrandMatch`, `ScanLists`, `CertificateInfo`
- `SearchResponse`, `SearchResultItem`
- `QuotaInfo`, `QuotaWindow`

`ScanResult.verdicts` follows the live urlscan structure with nested sections such as `overall`, `urlscan`, `engines`, and `community`. For example, use `result.verdicts.overall.score` for the top-level score.

`SearchResponse` includes `total`, `took`, `has_more`, and `results`. `SearchResultItem` exposes stable top-level fields such as `id`, `score`, `sort`, `page`, `task`, `stats`, `result`, and `screenshot`, while preserving less consistent live API sections as model extras. Search supports optional `datasource` and `collapse` parameters, and serializes `search_after` cursors in the comma-separated form expected by urlscan.

Search uses urlscan's searchable index. If you already have an exact scan UUID, prefer `get_result(uuid)`; a retrievable UUID is not guaranteed to appear in search results under every account plan or index state.

Key exceptions:

- `UrlscopeError`
- `AuthenticationError`
- `ValidationError`
- `NotFoundError`
- `ScanDeletedError`
- `RateLimitError`
- `ScanTimeoutError`
- `APIError`

## Development

```bash
uv sync --extra dev
.venv/bin/pytest tests/
.venv/bin/ruff check src/ tests/
.venv/bin/mypy src/
.venv/bin/python -m build
.venv/bin/python -m twine check dist/*
```

The package version is defined in `src/urlscope/__init__.py` and read dynamically by Hatchling during builds.

## License

MIT
