Metadata-Version: 2.4
Name: qcrawl
Version: 0.3.3
Summary: Fast async web crawler & scraping framework, supporting deduplication, and extensible middleware.
Project-URL: Homepage, https://www.qcrawl.org/
Project-URL: Repository, https://github.com/crawlcore/qcrawl
Project-URL: Issues, https://github.com/crawlcore/qcrawl/issues
Project-URL: Documentation, https://www.qcrawl.org/
Author-email: Vasiliy Kiryanov <vasiliy.kiryanov@gmail.com>
Maintainer-email: Vasiliy Kiryanov <vasiliy.kiryanov@gmail.com>
License: MIT
License-File: LICENSE
Keywords: async,asyncio,crawler,scraper,spider,web-scraping
Classifier: Development Status :: 5 - Production/Stable
Classifier: Framework :: AsyncIO
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: aiofiles>=24.1.0
Requires-Dist: aiohttp>=3.10.0
Requires-Dist: charset-normalizer>=3.3.2
Requires-Dist: cssselect>=1.2.0
Requires-Dist: lxml>=5.3.0
Requires-Dist: msgspec>=0.18.0
Requires-Dist: orjson>=3.10.0
Requires-Dist: yarl>=1.13.0
Provides-Extra: dev
Requires-Dist: lxml-stubs>=0.5.1; extra == 'dev'
Requires-Dist: mypy>=1.10; extra == 'dev'
Requires-Dist: pre-commit>=3.5.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=0.23; extra == 'dev'
Requires-Dist: pytest-cov>=4.1.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: redis>=6.4.0; extra == 'dev'
Requires-Dist: ruff>=0.1.0; extra == 'dev'
Requires-Dist: tox>=4.32.0; extra == 'dev'
Requires-Dist: types-aiofiles>=23.2.0; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.6.23; extra == 'docs'
Provides-Extra: observability
Requires-Dist: opentelemetry-api>=1.27.0; extra == 'observability'
Requires-Dist: opentelemetry-exporter-otlp-proto-http>=1.27.0; extra == 'observability'
Requires-Dist: opentelemetry-sdk>=1.27.0; extra == 'observability'
Requires-Dist: prometheus-client>=0.20.0; extra == 'observability'
Provides-Extra: opentelemetry
Requires-Dist: opentelemetry-api>=1.27.0; extra == 'opentelemetry'
Requires-Dist: opentelemetry-exporter-otlp-proto-http>=1.27.0; extra == 'opentelemetry'
Requires-Dist: opentelemetry-sdk>=1.27.0; extra == 'opentelemetry'
Provides-Extra: prometheus
Requires-Dist: prometheus-client>=0.20.0; extra == 'prometheus'
Provides-Extra: redis
Requires-Dist: redis>=6.4.0; extra == 'redis'
Description-Content-Type: text/markdown

<img src="https://www.qcrawl.org/assets/crawl.svg" alt="qCrawl Logo" style="min-width:75%;" />

[![PyPI Version](https://img.shields.io/pypi/v/qcrawl.svg?style=for-the-badge)](https://pypi.org/project/qcrawl)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/qcrawl.svg?style=for-the-badge)](https://pypi.org/project/qcrawl)
[![Codecov](https://img.shields.io/codecov/c/github/crawlcore/qcrawl/main?style=for-the-badge)](https://codecov.io/gh/crawlcore/qcrawl)

[qcrawl](https://www.qcrawl.org) is a fast async web crawling & scraping framework for Python to extract structured data from web-pages.
It is cross-platform and easy to install via `pip`, `conda`, or OS packages.

Follow the [documentation](https://www.qcrawl.org/).


### Libraries comparison

| Attribute           | qCrawl ⭐                                                          | Scrapy                                                           | Playwright                                                             | Colly                                     |
|---------------------|-------------------------------------------------------------------|------------------------------------------------------------------|------------------------------------------------------------------------|-------------------------------------------|
| Language            | Python                                                            | Python                                                           | Node.js, Python, Java                                                  | Go                                        |
| Concurrency model   | Asyncio native with threads for I/O work                          | Evented (Twisted) with non‑blocking I/O                          | Isolated contexts within browser instance + multiple browser instances | Goroutines (lightweight threads)          |
| Queue               | Priority queue with FIFO tiebreak, memory, [disk,] redis backends | Priority queue with FIFO/LIFO tiebreak, memory and disk backends | No built-in crawl queue (user-managed)                                 | FIFO with memory and file backends        |
| Middleware & hooks  | Downloader + Spider middlewares; signal-driven lifecycle hooks    | Downloader + Spider middlewares; signal-driven lifecycle hooks   | Hooks and interception API; not pipeline-centric                       | Middleware-style callbacks                |
| Crawl throttling    | Per-domain concurrency with configurable delay                    | Per-domain concurrency with configurable delay                   | Controlled via browser sessions                                        | Per-host concurrency                      |
| Strengths           | Lightweight, high-throughput, easy to extend                      | Very mature ecosystem and community, easy to extend              | Real browser rendering, JS support, robust for SPA sites               | Extremely high throughput, low memory use |

