Metadata-Version: 2.4
Name: crawlerkit-core
Version: 0.2.0
Summary: Browserless crawler base: curl_cffi transport, TLS/AIA, identity, proxy, captcha, BaseCrawler/BaseParser.
Author-email: Lucas Caovilla <lucasgrisac@gmail.com>
License-Expression: MIT
Project-URL: Homepage, https://github.com/lucascaovilla/crawlerkit
Project-URL: Repository, https://github.com/lucascaovilla/crawlerkit
Project-URL: Documentation, https://github.com/lucascaovilla/crawlerkit#readme
Project-URL: Issues, https://github.com/lucascaovilla/crawlerkit/issues
Keywords: crawler,scraping,curl_cffi,tls,fingerprint,captcha,browserless
Classifier: Development Status :: 4 - Beta
Classifier: Intended Audience :: Developers
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.11
Description-Content-Type: text/markdown
License-File: LICENSE
Requires-Dist: curl_cffi>=0.7
Requires-Dist: browserforge>=1.2
Requires-Dist: cryptography>=42
Requires-Dist: certifi>=2024.0
Requires-Dist: selectolax>=0.3
Requires-Dist: lxml>=5.0
Requires-Dist: beautifulsoup4>=4.12
Requires-Dist: structlog>=24.1
Requires-Dist: tenacity>=8.2
Requires-Dist: weasyprint>=60
Provides-Extra: dev
Requires-Dist: pytest>=8.0; extra == "dev"
Requires-Dist: ruff>=0.5; extra == "dev"
Requires-Dist: build>=1.2; extra == "dev"
Requires-Dist: twine>=5.0; extra == "dev"
Requires-Dist: commitizen>=3.27; extra == "dev"
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.5; extra == "docs"
Requires-Dist: mkdocstrings[python]>=0.25; extra == "docs"
Dynamic: license-file

# crawlerkit-core

[![PyPI version](https://img.shields.io/pypi/v/crawlerkit-core.svg)](https://pypi.org/project/crawlerkit-core/)
[![Python versions](https://img.shields.io/pypi/pyversions/crawlerkit-core.svg)](https://pypi.org/project/crawlerkit-core/)
[![CI](https://github.com/lucascaovilla/crawlerkit/actions/workflows/ci.yml/badge.svg)](https://github.com/lucascaovilla/crawlerkit/actions/workflows/ci.yml)
[![License: MIT](https://img.shields.io/badge/License-MIT-blue.svg)](LICENSE)

A **standalone, browserless** crawler base (`crawlerkit.core`): fingerprinted **curl_cffi** transport,
per-host TLS with **AIA repair** + `.pfx` client certs, **browserforge** identity (UA snapped to the
impersonate target), proxy providers, a pluggable **captcha** registry, an error taxonomy with
retry+rotation, and the `BaseCrawler.flow()` / `BaseParser.parse()` hooks. Zero non-PyPI dependencies —
`parse()` returns **your own type**, not one the library dictates.

## Install

```bash
pip install crawlerkit-core
```

## Use

```python
from crawlerkit.core import BaseCrawler, BaseParser, RawResponse, Transport, Profile
from crawlerkit.core.captcha import default_registry, McaptchaPowSolver, mcaptcha_hint
from crawlerkit.core.proxy import StaticProxyProvider, BrightDataProxyProvider
from crawlerkit.core.errors import BlockedError, TransientError, raise_for_block
```

**HTTP is curl_cffi only — `requests` is never used.** Deps: curl_cffi, browserforge, cryptography,
certifi, selectolax, lxml, beautifulsoup4, weasyprint, structlog, tenacity.

## Logging

Logging is **opt-in and off by default** — crawlerkit emits nothing unless you ask. Set
`enable_logs = True` on your crawler or parser to turn on structlog events:

```python
class MyCrawler(BaseCrawler):
    enable_logs = True   # default is False
```

**Build a crawler:** [GETTING_STARTED.md](GETTING_STARTED.md). **Run the demos:**
[`examples/`](examples/) (`quotes.py` — a full crawl+parse; `fingerprint_demo.py` — identity proof).
Reference: [`docs/`](docs/) (identity, transport-tls, proxy, captcha, cracking-govbr-turnstile, errors,
api). License: MIT.
