for me
print (tostring(e, encoding=str))
returns
>>> print (tostring(e, encoding=str)) Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/usr/lib/python2.7/dist-packages/lxml/html/__init__.py", line 1493, in tostring encoding=encoding) File "lxml.etree.pyx", line 2836, in lxml.etree.tostring (src/lxml/lxml.etree.c:53416) TypeError: descriptor 'upper' of 'str' object needs an argument
I cannot speak with descrepencey, but I suggest setting the pretty_print argument to true
>>> etree.tostring(e, pretty_print=True) '<html>\n <head>\n <link href="/comments.css" rel="stylesheet" type="text/css"/>\n <link href="/index.css" rel="stylesheet" type="text/css"/>\n </head>\n <body>\n <span/>\n <span/>\n </body>\n</html>\n'
you will need to import etree from lxml import etree
when outputting to outfile, spaces and newlines will be preserved. Also with print
>>> print(etree.tostring(e, pretty_print=True)) <html> <head> <link href="/comments.css" rel="stylesheet" type="text/css"/> <link href="/index.css" rel="stylesheet" type="text/css"/> </head> <body> <span/> <span/> </body> </html>
I'm sure you checked the API , but if you don't have information on tostring () . It is also safe to assume that you saw the tutorial on the lxml website. I would like to see some more “good” resources. I am new to lxml myself, and something new and good to read will be welcome.
Updated
you said you want to leave sed if you can't find a good python solution.
this should be done with sed
sed -i '1,2d;' input.html; sed -i '1 i\<html><head>' input.html
two sed procedures are executed. the first deletes the first 2 lines. second insertion <html><head> in the first line.
UPDATE # 2
I should have thought about this more. you can do it with python
>>> import re >>> newString = re.sub('\n ', '', etree.tostring(e,encoding=unicode,pretty_print=True), count=1) >>> print(newString) <html><head> <link href="/comments.css" rel="stylesheet" type="text/css"/> <link href="/index.css" rel="stylesheet" type="text/css"/> </head> <body> <span/> <span/> </body> </html>