Retrying the Scrapy request even when receiving status code 200

There is a website that I clean up that sometimes returns 200, but does not have any text in response.body (it raises the AttributeError attribute when I try to parse it using Selector).

Is there an easy way to verify that the body contains text, and if not, retry the request until it is complete? Here is some kind of pseudo code to describe what I'm trying to do.

def check_response(response):
    if response.body != '':
        return response
    else:
        return Request(copy_of_response.request,
                       callback=check_response)

Basically, is there a way to repeat the request with the same properties (method, url, payload, cookies, etc.)?

+3
source share
2 answers

Follow the principle EAFP:

, . Python , . . LBYL, , C.

Request URL- dont_filter=True:

dont_filter (boolean) - , . , , . , . False.

def parse(response):
    try:
        # parsing logic here
    except AttributeError:
        yield Request(response.url, callback=self.parse, dont_filter=True)

( ):

new_request = response.request.copy()
new_request.dont_filter = True
yield new_request

, replace():

new_request = response.request.replace(dont_filter=True)
yield new_request
+6

_rety() retry, , ?

:

DOWNLOADER_MIDDLEWARES = {
    'scrapy.downloadermiddlewares.retry.RetryMiddleware': None,
    'scraper.middlewares.retry.RetryMiddleware': 550,
}

:

from scrapy.downloadermiddlewares.retry import RetryMiddleware \
    as BaseRetryMiddleware


class RetryMiddleware(BaseRetryMiddleware):


    def process_response(self, request, response, spider):
        # inject retry method so request could be retried by some conditions
        # from spider itself even on 200 responses
        if not hasattr(spider, '_retry'):
            spider._retry = self._retry
        return super(RetryMiddleware, self).process_response(request, response, spider)

ex.:

yield self._retry(response.request, ValueError, self)
+3

Source: https://habr.com/ru/post/1668295/


All Articles