I'm trying to parse the HTML page I got through pyCurl, but pyCurl WRITEFUNCTION returns the page as BYTES, not a string, so I cannot parse it with BeautifulSoup.
Is there a way to convert io.BytesIO to io.StringIO?
Or is there another way to parse an HTML page?
I am using Python 3.3.2.
Naive approach:
# assume bytes_io is a `BytesIO` object byte_str = bytes_io.read() # Convert to a "unicode" object text_obj = byte_str.decode('UTF-8') # Or use the encoding you expect # Use text_obj how you see fit! # io.StringIO(text_obj) will get you to a StringIO object if that what you need
. , .
# Initialize a read buffer input = io.BytesIO( b'Inital value for read buffer with unicode characters ' + 'ÁÇÊ'.encode('utf-8') ) wrapper = io.TextIOWrapper(input, encoding='utf-8') # Read from the buffer print(wrapper.read())
Source: https://habr.com/ru/post/1547105/More articles:FactoryGirl создает несколько записей - rubyДобавить список значений внутри списка - listObjective-C weak declaration - objective-cCannot add file to solution: "A file or folder with the name [name] already exists” - c #How to control hover effect in CSS3? - javascriptWhat is the best way to multiply a large and sparse matrix with its transposition? - c ++How to press 3 keys at a time using KeyPress? - c #уникальный указатель С++: утечка памяти - c++Multiple Left Registration on the same table - sql3 divs regroup stack div on mobile devices - cssAll Articles