Python: difference between "lxml" and "html.parser" and "html5lib" with a nice soup?

Question

Python: difference between "lxml" and "html.parser" and "html5lib" with a nice soup?

When using beautiful soup, what is the difference between "lxml" and "html.parser" and "html5lib"? When will you use one over the other and the advantages of each? since I used each of them, they seem to be interchangeable, but I correct that I should use the other from the people here. I would like to strengthen my understanding of these. I read a couple of posts about this here, but they don’t use many of them at all.

Example -

soup = BeautifulSoup(response.text, 'lxml')

+4

python beautifulsoup

duc hathaway Aug 3 '17 at 21:06

source share

2 answers

BeautifulSoup:

, :

html.parser - -
html5lib - - , HTML
lxml -

+5

alecxe 03 . '17 21:10

Vinícius Aguiar · Accepted Answer · 2017-08-03T21:26:40+0000

From the docs a generalized table of advantages and disadvantages:

html.parser - BeautifulSoup(markup, "html.parser")
- : , , Lenient ( Python 2.7.3 3.2.)
- : ( Python 2.7.3 3.2.2)
lxml - BeautifulSoup(markup, "lxml")
- : , Lenient
- : C
html5lib - BeautifulSoup(markup, "html5lib")
- : , , -, HTML5
- : , Python

Python: difference between "lxml" and "html.parser" and "html5lib" with a nice soup?

More articles: