Can't use read () for html2text?

I am making a Python program that searches for a webpage for a word. Although when I try to use

website = urllib.request.urlopen(url)
content = website.read()
website.close()
test = html2text.html2text(content)
print(test)

I get this error:

test = html2text.html2text(content)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-    packages/html2text/__init__.py", line 840, in html2text
return h.handle(html)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-  packages/html2text/__init__.py", line 129, in handle
self.feed(data)
File "/Library/Frameworks/Python.framework/Versions/3.5/lib/python3.5/site-packages/html2text/__init__.py", line 125, in feed
data = data.replace("</' + 'script>", "</ignore>")
TypeError: a bytes-like object is required, not 'str'

I'm new to Python, so I'm not sure how to handle this error.
Python 3.5, Mac.

+4
source share
1 answer

decode()contents with encoding sent inside the header Charset( link ):

resource = urllib.request.urlopen(url)
content = resource.read()
charset = resource.headers.get_content_charset()
content = content.decode(charset)

Works for me (Python 3.5, Mac OS).

+2
source

Source: https://habr.com/ru/post/1621829/


All Articles