Convert io.BytesIO to io.StringIO to parse an HTML page

I'm trying to parse the HTML page I got through pyCurl, but pyCurl WRITEFUNCTION returns the page as BYTES, not a string, so I cannot parse it with BeautifulSoup.

Is there a way to convert io.BytesIO to io.StringIO?

Or is there another way to parse an HTML page?

I am using Python 3.3.2.

+13
source share
2 answers

Naive approach:

# assume bytes_io is a `BytesIO` object
byte_str = bytes_io.read()

# Convert to a "unicode" object
text_obj = byte_str.decode('UTF-8')  # Or use the encoding you expect

# Use text_obj how you see fit!
# io.StringIO(text_obj) will get you to a StringIO object if that what you need
+9
source

. , .

# Initialize a read buffer
input = io.BytesIO(
    b'Inital value for read buffer with unicode characters ' +
    'ÁÇÊ'.encode('utf-8')
)
wrapper = io.TextIOWrapper(input, encoding='utf-8')

# Read from the buffer
print(wrapper.read())
+20

Source: https://habr.com/ru/post/1547105/


All Articles