No more BeautifulSoup

Question

No more BeautifulSoup

I use BeautifulSoup, but as I understand it, the library is no longer supported. So what should I use? I heard about Xpath, but what else is there?

+3

python parsing

Peter Nielsen Jul 14 '10 at 8:05

source share

4 answers

lxml, . html5lib, . html, , -, html.

BeautifulSoup, Beautiful Soup, :

import html5lib
from html5lib import treebuilders

f = open("mydocument.html")
parser = html5lib.HTMLParser(tree=treebuilders.getTreeBuilder("beautifulsoup"))
minidom_document = parser.parse(f)

+6

fmark 14 . '10 8:34

lxml lib: http://codespeak.net/lxml/

+4

Roki 14 . '10 8:08

, python, TagSoup. Java, . Tidy , .

0

Borealid 14 . '10 8:07

Nick bastin · Accepted Answer · 2010-07-14T08:27:36+0000

A bugfix version was released in April, so I don’t even know where you understand that it is no longer supported. However, even if that were the case, BeautifulSoup is still a lot of functionality, and I really don't see even the current implementation being interrupted in the near future. You may have problems with HTML 5 in the next 2 years (although there are much fewer quirks, so it's easier to parse at least so far), but there is no particular reason not to use BeautifulSoup. The community is still active with support, etc. In the google group, and obviously the source code is available to you as needed.

No more BeautifulSoup

More articles: