Prevent response body loading in python async http requests

I want to "ping" the server, check the response of the header to see if the link was broken, and if it is not broken, download the response body.

Traditionally, using the synchronization method with the module requests, you can send a request getwith a parameter stream = Trueand capture the headers before loading the response body, deciding, in case of an error (not found), for example) to interrupt the connection.

My problem is whether to do this using asynchronous libraries grequestsor requests-futuresit has become impossible for my reduced knowledge base.

I tried setting the stream parameter to true in request-futures, but not using it, it still loads the response body, not letting me interfere as soon as it receives the response headers. And even if that were the case, I would not be sure how to proceed.

Here is what I tried:

test.py

from requests_futures.sessions import FuturesSession

session = FuturesSession()
session.stream = True

future = session.get('http://www.google.com')
response = future.result()
print(response.status_code) # Here I would assume the response body hasn't been loaded

After debugging, I find that it loads the body of the response anyway.

I would appreciate any solution to the original problem, be it my logic or not.

+6
source share
2 answers

I believe you need an HTTP HEAD request:

session.head('http://www.google.com')

Per w3.org, " HEAD GET, , ". , GET.

, , . . GET, recv , , , .

, :

import socket

def fetch_on_header_condition(host, resource, condition, port=80):
    request =  'GET %s HTTP/1.1\r\n' % resource
    request += 'Host: %s\r\n' % host
    request += 'Connection: close\r\n'
    request += '\r\n'

    s = socket.socket()
    try:
        s.connect((host, port))
        s.send(request)
        first_block = s.recv(4096)
        if not condition(first_block):
            return False, ''
        blocks = [first_block]
        while True:
            block = s.recv(4096)
            if not block:
                break
            blocks.append(block)
        return True, ''.join(blocks)
    finally:
        s.close()

if __name__ == '__main__':
    print fetch_on_header_condition(
        host = 'www.jython.org',
        port = 80,
        resource = '/',
        condition = lambda s: 'Content-Type: text/xml' in s,
    )
+2

:

header = session.head('https://google.com')

if header.ok is True:
    session.get('https://google.com')
+1

Source: https://habr.com/ru/post/1015953/


All Articles