Get the best parsing error message from ElementTree

If I try to parse broken XML, the exception shows the line number. Is there a way to show the XML context?

I want to see xml tags before and after the broken part.

Example:

import xml.etree.ElementTree as ET tree = ET.fromstring('<a><b></a>') 

An exception:

 Traceback (most recent call last): File "tmp/foo.py", line 2, in <module> tree = ET.fromstring('<a><b></a>') File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1300, in XML parser.feed(text) File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1642, in feed self._raiseerror(v) File "/usr/lib/python2.7/xml/etree/ElementTree.py", line 1506, in _raiseerror raise err xml.etree.ElementTree.ParseError: mismatched tag: line 1, column 8 

Something like this would be nice:

 ParseError: <a><b></a> =====^ 
+5
source share
2 answers

You can make a helper function for this:

 import sys import io import itertools as IT import xml.etree.ElementTree as ET PY2 = sys.version_info[0] == 2 StringIO = io.BytesIO if PY2 else io.StringIO def myfromstring(content): try: tree = ET.fromstring(content) except ET.ParseError as err: lineno, column = err.position line = next(IT.islice(StringIO(content), lineno)) caret = '{:=>{}}'.format('^', column) err.msg = '{}\n{}\n{}'.format(err, line, caret) raise return tree myfromstring('<a><b></a>') 

gives

 xml.etree.ElementTree.ParseError: mismatched tag: line 1, column 8 <a><b></a> =======^ 
+12
source

This is not the best option, but it’s easy and simple, you can simply ParseError Extract the row and column, and then use it to show where the problem is.

 import xml.etree.ElementTree as ET from xml.etree.ElementTree import ParseError my_string = '<a><b><c></b></a>' try: tree = ET.fromstring(my_string) except ParseError as e: formatted_e = str(e) line = int(formatted_e[formatted_e.find("line ") + 5: formatted_e.find(",")]) column = int(formatted_e[formatted_e.find("column ") + 7:]) split_str = my_string.split("\n") print "{}\n{}^".format(split_str[line - 1], len(split_str[line - 1][0:column])*"-") 

Note: \n is only for the example you need to split into the correct path.

+1
source

Source: https://habr.com/ru/post/1210400/


All Articles