Python Queries: Choosing a Header for the Content of the Response

Using python requests and python-magic, I would like to test the mime type of a web resource without getting all its content (especially if this resource, for example, is an ogg file or a PDF file). Based on the result, I could decide to get all this. However, calling the text method after checking the mime type returns only what has not yet been consumed. How can I check mime type without consuming response content?

Below is my current code.

import requests import magic r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False) mime = magic.from_buffer(r.iter_content(256).next(), mime=True) if mime == "text/html": print(r.text) # I'd like r.text to give me the entire response content 

Thanks!

+4
source share
2 answers

Note. When this question was asked, the correct method to extract only the body header stream was to use prefetch=False . Since then, this parameter has been renamed to stream , and the Boolean value is inverted, so you want stream=True .

The following is the original answer.


Once you use iter_content() , you must continue to use it; .text indirectly uses the same interface under the hood (via .content ).

In other words, using iter_content() in general, you must do the .text work manually:

 from requests.compat import chardet r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False) peek = r.iter_content(256).next() mime = magic.from_buffer(peek, mime=True) if mime == "text/html": contents = peek + b''.join(r.iter_content(10 * 1024)) encoding = r.encoding if encoding is None: # detect encoding encoding = chardet.detect(contents)['encoding'] try: textcontent = str(contents, encoding, errors='replace') except (LookupError, TypeError): textcontent = str(contents, errors='replace') print(textcontent) 

Assuming you are using Python 3.

An alternative is to perform two queries:

 r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False) mime = magic.from_buffer(r.iter_content(256).next(), mime=True) if mime == "text/html": print(r.requests.get("http://www.december.com/html/demo/hello.html").text) 

Python Version 2:

 r = requests.get("http://www.december.com/html/demo/hello.html", prefetch=False) peek = r.iter_content(256).next() mime = magic.from_buffer(peek, mime=True) if mime == "text/html": contents = peek + ''.join(r.iter_content(10 * 1024)) encoding = r.encoding if encoding is None: # detect encoding encoding = chardet.detect(contents)['encoding'] try: textcontent = unicode(contents, encoding, errors='replace') except (LookupError, TypeError): textcontent = unicode(contents, errors='replace') print(textcontent) 
+4
source

if a "content-type" is enough, you can issue an HTTP "Head" request instead of a "Get" just to get HTTP headers.

 import requests url = 'http://www.december.com/html/demo/hello.html' response = requests.head(url) print response.headers['content-type'] 
+7
source

Source: https://habr.com/ru/post/1443675/


All Articles