BeautifulSoup HTMLParseError

Question

BeautifulSoup HTMLParseError

New to Python, has a simple situational question:

Trying to use BeautifulSoup to parse a series of pages.

from bs4 import BeautifulSoup import urllib.request BeautifulSoup(urllib.request.urlopen('http://bit.ly/'))

Traceback ...

html.parser.HTMLParseError: expected name token at '<!=KN\x01...

Working with the 64-bit version of Windows 7 with Python 3.2.

Do I need to mechanize? (which will entail Python 2.X)

+4

python web-scraping beautifulsoup

Zack Mar 23 '12 at 15:17

source share

4 answers

If you tried to download this MP3, you can do something like this:

 import urllib2 BLOCK_SIZE = 16 * 1024 req = urllib2.urlopen("http://bit.ly/xg7enD") #Make sure to write as a binary file fp = open("someMP3.mp3", 'wb') try: while True: data = req.read(BLOCK_SIZE) if not data: break fp.write(data) finally: fp.close()

+4

Chicobird Aug 30 '12 at 23:40

source share

if you want to upload the file in python you can also use it

 import urllib urllib.urlretrieve("http://bit.ly/xg7enD","myfile.mp3")

and it will save your file in the current working directory with the name "myfile.mp3". I can download all file types through it.

hope this helps!

0

sumit Feb 07 '16 at 18:46

source share

instead of urllib.request I suggest using queries, and from this use lib get ()

 from requests import get from bs4 import BeautifulSoup soup = BeautifulSoup( get(url="http://www.google.com").content, 'html.parser' )

0

Jcc. Sanabria Feb 01 '17 at 20:00

source share

kindall · Accepted Answer · 2012-03-23T15:27:12+0000

If this URL is correct, you ask why the HTML parser generates an error while parsing the MP3 file. I believe that the answer to this question will be taken for granted ...

BeautifulSoup HTMLParseError

More articles: