Python lxml.html.soupparser.fromstring causes an annoying warning

Question

Python lxml.html.soupparser.fromstring causes an annoying warning

My code ...

foo = fromstring(my_html)

he triggers this warning ...

UserWarning: No parser was explicitly specified, so I'm using the best available HTML parser for this system ("html.parser"). This usually isn't a problem, but if you run this code on another system, or in a different virtual environment, it may use a different parser and behave differently.

To get rid of this warning, change this:

 BeautifulSoup([your markup])

to this:

 BeautifulSoup([your markup], "html.parser")

  markup_type=markup_type))

I tried passing a string to it 'html.parser', but it does not work, because it gives me an error saying that the string is not callable, so I tried html.parserand then I looked through the lxml module to see if I could find another parser and could not. I looked at python stdlib and saw that in version 2.7 there is one called HTMLParser, so I imported it and entered it beautifulsoup=HTMLParser, and that didn't work either.

Where fromstringdoes it indicate that I should switch to ?

EDIT added attempted solutions:

from lxml.html.soupparser import fromstring
wiktionary_page = fromstring(wiktionary_page.read(), features="html.parser" )

and this one

from lxml.html.soupparser import BeautifulSoup
wiktionary_page = fromstring(wiktionary_page.read(), beautifulsoup=lambda s: BeautifulSoup(s, "html.parser"))

+4

python lxml beautifulsoup

deltaskelta Aug 15 '16 at 3:52

source share

1 answer

Padraic Cunningham · Accepted Answer · 2016-08-15T07:20:00+0000

feature, .

tree = lxml.html.soupparser.fromstring("<p>foo</p>", features="html.parser" )

fromstring: _parser , , bsargs ['features'] = ['html.parser'], bsargs['features'] = 'html.parser':

def _parse(source, beautifulsoup, makeelement, **bsargs):
    if beautifulsoup is None:
        beautifulsoup = BeautifulSoup
    if hasattr(beautifulsoup, "HTML_ENTITIES"):  # bs3
        if 'convertEntities' not in bsargs:
            bsargs['convertEntities'] = 'html'
    if hasattr(beautifulsoup, "DEFAULT_BUILDER_FEATURES"):  # bs4
        if 'features' not in bsargs:
            bsargs['features'] = ['html.parser']  # use Python html parser
    tree = beautifulsoup(source, **bsargs)
    root = _convert_tree(tree, makeelement)
    # from ET: wrap the document in a html root element, if necessary
    if len(root) == 1 and root[0].tag == "html":
        return root[0]
    root.tag = "html"
    return root

:

from lxml.html.soupparser import BeautifulSoup
import lxml.html.soupparser

tree = lxml.html.soupparser.fromstring("<p>foo</p>", beautifulsoup=lambda s: BeautifulSoup(s, "html.parser"))

:

In [13]: from lxml.html import soupparser

In [14]: tree = soupparser.fromstring("<p>foo</p>", features="html.parser" )
In [15]: from lxml.html.soupparser import BeautifulSoup

In [16]: import lxml.html.soupparser


In [17]: tree = lxml.html.soupparser.fromstring("<p>foo</p>", beautifulsoup=lambda s: BeautifulSoup(s, "html.parser"))

Python lxml.html.soupparser.fromstring causes an annoying warning

More articles: