Python web cleanup error - TypeError: cannot use string pattern for byte object

I want to create a web scraper. I am currently learning Python. These are the very basics!

Python code

import urllib.request
import re

htmlfile = urllib.request.urlopen("http://basketball.realgm.com/")

htmltext = htmlfile.read()
title = re.findall('<title>(.*)</title>', htmltext)

print (htmltext)

Error:

  File "C:\Python33\lib\re.py", line 201, in findall
    return _compile(pattern, flags).findall(string)
TypeError: can't use a string pattern on a bytes-like object
+4
source share
2 answers

You must decode your data. Because the website in question is talking

charset=iso-8859-1

use this. In this case, utf-8 will not work.

htmltext = htmlfile.read().decode('iso-8859-1')
+5
source

Use byte literal as a pattern:

title = re.findall(b'<title>(.*)</title>', htmltext)

or decode the received data into a string:

title = re.findall('<title>(.*)</title>', htmltext.decode('utf-8'))

(change utf-8with the appropriate encoding of the document)

+3
source

Source: https://habr.com/ru/post/1545856/


All Articles