Retrying the Scrapy request even when receiving status code 200

Question

Retrying the Scrapy request even when receiving status code 200

There is a website that I clean up that sometimes returns 200, but does not have any text in response.body (it raises the AttributeError attribute when I try to parse it using Selector).

Is there an easy way to verify that the body contains text, and if not, retry the request until it is complete? Here is some kind of pseudo code to describe what I'm trying to do.

def check_response(response):
    if response.body != '':
        return response
    else:
        return Request(copy_of_response.request,
                       callback=check_response)

Basically, is there a way to repeat the request with the same properties (method, url, payload, cookies, etc.)?

+3

python web-scraping scrapy scrapy-spider

chr1sbest Feb 20 '15 at 23:39

source share

2 answers

alecxe · Answer 1 · 2015-02-20T23:40:25+0000

Follow the principle EAFP:

, . Python , . . LBYL, , C.

Request URL- dont_filter=True:

dont_filter (boolean) - , . , , . , . False.

def parse(response):
    try:
        # parsing logic here
    except AttributeError:
        yield Request(response.url, callback=self.parse, dont_filter=True)

( ):

new_request = response.request.copy()
new_request.dont_filter = True
yield new_request

, replace():

new_request = response.request.replace(dont_filter=True)
yield new_request

Dmitriy · Answer 2 · 2017-09-11T04:38:42+0000

_rety() retry, , ?

:

DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.retry.RetryMiddleware': None,
    'scraper.middlewares.retry.RetryMiddleware': 550,
}

:

from scrapy.downloadermiddlewares.retry import RetryMiddleware \
    as BaseRetryMiddleware


class RetryMiddleware(BaseRetryMiddleware):


    def process_response(self, request, response, spider):
        # inject retry method so request could be retried by some conditions
        # from spider itself even on 200 responses
        if not hasattr(spider, '_retry'):
            spider._retry = self._retry
        return super(RetryMiddleware, self).process_response(request, response, spider)

ex.:

yield self._retry(response.request, ValueError, self)

Retrying the Scrapy request even when receiving status code 200

More articles: