Is there any difference between the capabilities of lxml and html5lib parsers in the context of beautifulsoup? I am trying to learn how to use BS4 and use the following code construct -
ret = requests.get('http://www.olivegarden.com') soup = BeautifulSoup(ret.text, 'html5lib') for item in soup.find_all('a'): print item['href']
I started by using lxml as a parser, but noticed that for some sites the for loop is never entered, although the page has valid links. The same page works with the html5ib parser. Are there any specific page types that may not work with lxml?
I'm on Ubuntu using python-lxml 2.3.2-1 with libxml2 2.7.8.dfsg-5.1ubunt and html5lib-1.0b3
EDIT: Updated to lxml 3.1.2 and still see the same problem. On mac, although 3.0.x works, the page handles correctly. This website is www.olivegarden.com
source share