Metadata-Version: 2.4
Name: qcrawl
Version: 0.3.5
Summary: Fast async web crawler & scraping framework, supporting deduplication, and extensible middleware.
Project-URL: Homepage, https://www.qcrawl.org/
Project-URL: Repository, https://github.com/crawlcore/qcrawl
Project-URL: Issues, https://github.com/crawlcore/qcrawl/issues
Project-URL: Documentation, https://www.qcrawl.org/
Author-email: Vasiliy Kiryanov <vasiliy.kiryanov@gmail.com>
Maintainer-email: Vasiliy Kiryanov <vasiliy.kiryanov@gmail.com>
License: MIT
License-File: LICENSE
Keywords: async,asyncio,crawler,scraper,spider,web-scraping
Classifier: Development Status :: 5 - Production/Stable
Classifier: Framework :: AsyncIO
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Operating System :: OS Independent
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.11
Classifier: Programming Language :: Python :: 3.12
Classifier: Programming Language :: Python :: 3.13
Classifier: Programming Language :: Python :: 3.14
Classifier: Topic :: Internet :: WWW/HTTP
Classifier: Topic :: Software Development :: Libraries :: Application Frameworks
Classifier: Topic :: Software Development :: Libraries :: Python Modules
Classifier: Typing :: Typed
Requires-Python: >=3.11
Requires-Dist: aiofiles>=24.1.0
Requires-Dist: aiohttp>=3.10.0
Requires-Dist: charset-normalizer>=3.3.2
Requires-Dist: cssselect>=1.2.0
Requires-Dist: lxml>=5.3.0
Requires-Dist: msgspec>=0.20.0
Requires-Dist: orjson>=3.10.0
Requires-Dist: yarl>=1.13.0
Provides-Extra: camoufox
Requires-Dist: camoufox>=0.4.11; extra == 'camoufox'
Provides-Extra: dev
Requires-Dist: camoufox>=0.4.11; extra == 'dev'
Requires-Dist: lxml-stubs>=0.5.1; extra == 'dev'
Requires-Dist: mypy>=1.19.0; extra == 'dev'
Requires-Dist: pre-commit>=3.5.0; extra == 'dev'
Requires-Dist: pytest-asyncio>=1.3.0; extra == 'dev'
Requires-Dist: pytest-cov>=7.0.0; extra == 'dev'
Requires-Dist: pytest>=8.0; extra == 'dev'
Requires-Dist: redis>=6.4.0; extra == 'dev'
Requires-Dist: ruff>=0.14.0; extra == 'dev'
Requires-Dist: testcontainers>=4.13.0; extra == 'dev'
Requires-Dist: tox>=4.32.0; extra == 'dev'
Requires-Dist: types-aiofiles>=25.1.0; extra == 'dev'
Provides-Extra: docs
Requires-Dist: mkdocs-material>=9.7.0; extra == 'docs'
Provides-Extra: observability
Requires-Dist: opentelemetry-api>=1.27.0; extra == 'observability'
Requires-Dist: opentelemetry-exporter-otlp-proto-http>=1.27.0; extra == 'observability'
Requires-Dist: opentelemetry-sdk>=1.27.0; extra == 'observability'
Requires-Dist: prometheus-client>=0.20.0; extra == 'observability'
Provides-Extra: opentelemetry
Requires-Dist: opentelemetry-api>=1.27.0; extra == 'opentelemetry'
Requires-Dist: opentelemetry-exporter-otlp-proto-http>=1.27.0; extra == 'opentelemetry'
Requires-Dist: opentelemetry-sdk>=1.27.0; extra == 'opentelemetry'
Provides-Extra: prometheus
Requires-Dist: prometheus-client>=0.20.0; extra == 'prometheus'
Provides-Extra: redis
Requires-Dist: redis>=6.4.0; extra == 'redis'
Description-Content-Type: text/markdown

<img src="https://www.qcrawl.org/assets/crawl.svg" alt="qCrawl Logo" style="min-width:75%;" />

[![PyPI Version](https://img.shields.io/pypi/v/qcrawl.svg?style=for-the-badge)](https://pypi.org/project/qcrawl)
[![PyPI - Python Version](https://img.shields.io/pypi/pyversions/qcrawl.svg?style=for-the-badge)](https://pypi.org/project/qcrawl)
[![Codecov](https://img.shields.io/codecov/c/github/crawlcore/qcrawl/main?style=for-the-badge)](https://codecov.io/gh/crawlcore/qcrawl)

[qcrawl](https://www.qcrawl.org) is a fast async web crawling & scraping framework for Python to extract structured data from web-pages.
It is cross-platform and easy to install via `pip` or `conda`.

Follow the [documentation](https://www.qcrawl.org/).


### qCrawl features

1. Async architecture - High-performance concurrent crawling based on asyncio
2. Performance optimized - Queue backend on Redis with direct delivery, messagepack serialization, connection pooling, DNS caching
3. Powerful parsing - CSS/XPath selectors with lxml
4. Middleware system - Customizable request/response processing
5. Flexible export - Multiple output formats including JSON, CSV, XML
6. Flexible queue backends - Memory or Redis-based (+disk) schedulers for different scale requirements
7. Item pipelines - Data transformation, validation, and processing pipeline
8. Pluggable downloaders - HTTP (aiohttp), Camoufox (stealth browser) for JavaScript rendering and anti-bot evasion
