Lxml html5parser ignores the parameter "namespaceHTMLElements = False"

lxml html5parser seems to ignore any option namespaceHTMLElements=FalseI pass to it. It puts all the elements that I pass into the HTML namespace instead of the (expected) void namespace.

Here is a simple case that reproduces the problem:

echo "<p>" | python -c "from sys import stdin; \
  from lxml.html import html5parser as h5, tostring; \
  print tostring(h5.parse(stdin, h5.HTMLParser(namespaceHTMLElements=False)))"

The way out of this:

<html:html xmlns:html="http://www.w3.org/1999/xhtml"><html:head></html:head><html:body><html:p>
</html:p></html:body></html:html>

As you can see, the element htmland all other elements are in the HTML namespace.

Expected result instead:

<html><head></head><body><p>
</p></body></html>

I understand that namespaceHTMLElementsthis is the html5lib parameter, not the native lxml parameter, which lxml does to itself directly. It is assumed that lxml just calls html5lib and passes this parameter to html5lib so that html5lib uses it as expected.


Update 2016-02-17

lxml html5parser namespaceHTMLElements. , , html5lib , :

echo "<p>" | python -c "from sys import stdin; \
import html5lib; from lxml import html; \
doc = html5lib.parse(stdin, treebuilder='lxml', namespaceHTMLElements=False); \
print html.tostring(doc)"

, :

  • html5lib HTML, , html HTML - html5lib
  • html5lib namespaceHTMLElements=False " html HTML".
  • html5lib ( lxml), namespaceHTMLElements=False, , , html void.
  • printf html5lib, , :

    • lxml html5lib namespaceHTMLElements=False
    • , , lxml html5lib : namespaceHTMLElements, namespaceHTMLElements=False

,

, , lxml html5lib. , lxml html5lib, , , - XHTMLParser, , Im , HTMLParser.

, , , html5lib, html5lib "" namespaceHTMLElements=True, , namespaceHTMLElements=False, .

, , , lxml html5lib, API html5lib , .

, , lxml, html5lib, , html5lib.

, Im , , - - , , , .

+4
1

, lxml params html5lib. * kws, . html5 , .

( , , - .)

, , 2018 html5lib - , - lxml .

( : crappy html xpath.)

0

Source: https://habr.com/ru/post/1608615/


All Articles