BeautifulSoup HTMLParseError

New to Python, has a simple situational question:

Trying to use BeautifulSoup to parse a series of pages.

from bs4 import BeautifulSoup import urllib.request BeautifulSoup(urllib.request.urlopen('http://bit.ly/')) 

Traceback ...

html.parser.HTMLParseError: expected name token at '<!=KN\x01...

Working with the 64-bit version of Windows 7 with Python 3.2.

Do I need to mechanize? (which will entail Python 2.X)

+4
source share
4 answers

If this URL is correct, you ask why the HTML parser generates an error while parsing the MP3 file. I believe that the answer to this question will be taken for granted ...

+24
source

If you tried to download this MP3, you can do something like this:

 import urllib2 BLOCK_SIZE = 16 * 1024 req = urllib2.urlopen("http://bit.ly/xg7enD") #Make sure to write as a binary file fp = open("someMP3.mp3", 'wb') try: while True: data = req.read(BLOCK_SIZE) if not data: break fp.write(data) finally: fp.close() 
+4
source

if you want to upload the file in python you can also use it

 import urllib urllib.urlretrieve("http://bit.ly/xg7enD","myfile.mp3") 

and it will save your file in the current working directory with the name "myfile.mp3". I can download all file types through it.

hope this helps!

0
source

instead of urllib.request I suggest using queries, and from this use lib get ()

 from requests import get from bs4 import BeautifulSoup soup = BeautifulSoup( get(url="http://www.google.com").content, 'html.parser' ) 
0
source

Source: https://habr.com/ru/post/1403105/


All Articles