Parsing Unicode XML with Python SAX in App Engine

I am using xml.sax with unicode XML strings as input originally entered from a web form. On my local machine (python 2.5, using xmlreader expat by default, running through the application engine), it works fine. However, the same code and input strings on production server servers fail with "malformed." For example, this happens with the code below:

from xml import sax
class MyHandler(sax.ContentHandler):
  pass

handler = MyHandler()
# Both of these unicode strings return 'not well-formed' 
# on app engine, but work locally
xml.parseString(u"<a>b</a>",handler) 
xml.parseString(u"<!DOCTYPE a[<!ELEMENT a (#PCDATA)> ]><a>b</a>",handler)

# Both of these work, but output unicode
xml.parseString("<a>b</a>",handler) 
xml.parseString("<!DOCTYPE a[<!ELEMENT a (#PCDATA)> ]><a>b</a>",handler)

leads to an error:

  File "<string>", line 1, in <module>
  File "/base/python_dist/lib/python2.5/xml/sax/__init__.py", line 49, in parseString
    parser.parse(inpsrc)
  File "/base/python_dist/lib/python2.5/xml/sax/expatreader.py", line 107, in parse
    xmlreader.IncrementalParser.parse(self, source)
  File "/base/python_dist/lib/python2.5/xml/sax/xmlreader.py", line 123, in parse
    self.feed(buffer)
  File "/base/python_dist/lib/python2.5/xml/sax/expatreader.py", line 211, in feed
    self._err_handler.fatalError(exc)
  File "/base/python_dist/lib/python2.5/xml/sax/handler.py", line 38, in fatalError
    raise exception
SAXParseException: <unknown>:1:1: not well-formed (invalid token)

Any reason the application engine parser, which also uses python2.5 and expat, will not work when entering unicode?

+3
1

, UTF-8. unicode XML, XML 1.0. Unicode UTF-8, .

+3

Source: https://habr.com/ru/post/1740975/


All Articles