Python: difference between "lxml" and "html.parser" and "html5lib" with a nice soup?

When using beautiful soup, what is the difference between "lxml" and "html.parser" and "html5lib"? When will you use one over the other and the advantages of each? since I used each of them, they seem to be interchangeable, but I correct that I should use the other from the people here. I would like to strengthen my understanding of these. I read a couple of posts about this here, but they don’t use many of them at all.

Example -

soup = BeautifulSoup(response.text, 'lxml')
+4
source share
2 answers

From the docs a generalized table of advantages and disadvantages:

  • html.parser - BeautifulSoup(markup, "html.parser")

    • : , , Lenient ( Python 2.7.3 3.2.)

    • : ( Python 2.7.3 3.2.2)

  • lxml - BeautifulSoup(markup, "lxml")

    • : , Lenient

    • : C

  • html5lib - BeautifulSoup(markup, "html5lib")

    • : , , -, HTML5

    • : , Python

+4

BeautifulSoup:

, :

  • html.parser - -
  • html5lib - - , HTML
  • lxml -
+5

Source: https://habr.com/ru/post/1682978/


All Articles