Metadata-Version: 2.1
Name: scrapy-webarchive
Version: 0.4.1
Summary: A webarchive extension for Scrapy
Project-URL: Documentation, https://developers.thequestionmark.org/scrapy-webarchive/
Project-URL: Repository, https://github.com/q-m/scrapy-webarchive
Keywords: Scrapy,Webarchive,WARC,WACZ
Classifier: Development Status :: 4 - Beta
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python
Requires-Python: <3.13,>=3.7
Description-Content-Type: text/markdown
Provides-Extra: aws
Provides-Extra: gcs
Provides-Extra: all

# Scrapy Webarchive

[![Docs](https://github.com/q-m/scrapy-webarchive/actions/workflows/docs.yml/badge.svg)](https://github.com/q-m/scrapy-webarchive/actions/workflows/docs.yml)

Scrapy Webarchive is a plugin for Scrapy that allows users to capture and export web archives in the WARC and WACZ formats during crawling.

## Features

* Save web crawls in WACZ format (multiple storages supported; local and cloud).
* Crawl against WACZ format archives.
* Integrate seamlessly with Scrapy’s spider request and response cycle.

## Compatibility

* Python 3.7, 3.8, 3.9, 3.10, 3.11 and 3.12

## Documentation

Documentation is available online at [developers.thequestionmark.org/scrapy-webarchive/](https://developers.thequestionmark.org/scrapy-webarchive/)
