Welcome to Scrapy Inline Requests’s documentation!¶
Contents:
Scrapy Inline Requests¶
A decorator for writing coroutine-like spider callbacks.
Requires Scrapy>=1.0
and supports Python 2.7+ and 3.4+.
- Free software: MIT license
- Documentation: https://scrapy-inline-requests.readthedocs.org.
Usage¶
The spider below shows a simple use case of scraping a page and following a few links:
from scrapy import Spider, Request
from inline_requests import inline_requests
class MySpider(Spider):
name = 'myspider'
start_urls = ['http://httpbin.org/html']
@inline_requests
def parse(self, response):
urls = [response.url]
for i in range(10):
next_resp = yield Request(response.urljoin('?page=%d' % i))
urls.append(next_resp.url)
yield {'urls': urls}
See the examples/
directory for a more complex spider.
Known Issues¶
- Middlewares can drop or ignore non-200 status responses causing the callback
to not continue its execution. This can be overcome by using the flag
handle_httpstatus_all
. See the httperror middleware documentation. - High concurrency and large responses can cause higher memory usage.
- This decorator assumes your method have the following signature
(self, response)
. - The decorated method must return a generator instance.
Installation¶
Stable release¶
To install Scrapy Inline Requests, run this command in your terminal:
$ pip install scrapy-inline-requests
If you don’t have pip installed, this Python installation guide can guide you through the process.
From sources¶
The sources for Scrapy Inline Requests can be downloaded from the Github repo.
You can either clone the public repository:
$ git clone git://github.com/rolando/scrapy-inline-requests
Or download the tarball:
$ curl -OL https://github.com/rolando/scrapy-inline-requests/tarball/master
Once you have a copy of the source, you can install it with:
$ python setup.py install
Reference¶
-
inline_requests.
inline_requests
(method_or_func)[source]¶ A decorator to use coroutine-like spider callbacks.
Example:
class MySpider(Spider): @inline_callbacks def parse(self, response): next_url = response.urjoin('?next') try: next_resp = yield Request(next_url) except Exception as e: self.logger.exception("An error occurred.") return else: yield {"next_url": next_resp.url}
You must conform with the following conventions:
- The decorated method must be a spider method.
- The decorated method must use the
yield
keyword or return a generator. - The decorated method must accept
response
as the first argument. - The decorated method must yield
Request
objects without neithercallback
norerrback
set.
If your requests don’t come back to the generator try setting the flag to handle all http statuses:
request.meta['handle_httpstatus_all'] = True
History¶
0.3.0dev¶
- Backward incompatible change: Added more restrictions to the request object (no callback/errback).
- Cleanup callback/errback attributes before sending back the request to the generator.
- Simplified example spider.
0.2.0 (2016-06-23)¶
- Python 3 support.
0.1.2 (2016-05-22)¶
- Scrapy API and documentation updates.
0.1.1 (2013-02-03)¶
- Minor tweaks and fixes.
0.1.0 (2012-02-03)¶
- First release on PyPI.