I am using xml.sax with unicode XML strings as input originally entered from a web form. On my local machine (python 2.5, using xmlreader expat by default, running through the application engine), it works fine. However, the same code and input strings on production server servers fail with "malformed." For example, this happens with the code below:
from xml import sax
class MyHandler(sax.ContentHandler):
pass
handler = MyHandler()
xml.parseString(u"<a>b</a>",handler)
xml.parseString(u"<!DOCTYPE a[<!ELEMENT a (#PCDATA)> ]><a>b</a>",handler)
xml.parseString("<a>b</a>",handler)
xml.parseString("<!DOCTYPE a[<!ELEMENT a (#PCDATA)> ]><a>b</a>",handler)
leads to an error:
File "<string>", line 1, in <module>
File "/base/python_dist/lib/python2.5/xml/sax/__init__.py", line 49, in parseString
parser.parse(inpsrc)
File "/base/python_dist/lib/python2.5/xml/sax/expatreader.py", line 107, in parse
xmlreader.IncrementalParser.parse(self, source)
File "/base/python_dist/lib/python2.5/xml/sax/xmlreader.py", line 123, in parse
self.feed(buffer)
File "/base/python_dist/lib/python2.5/xml/sax/expatreader.py", line 211, in feed
self._err_handler.fatalError(exc)
File "/base/python_dist/lib/python2.5/xml/sax/handler.py", line 38, in fatalError
raise exception
SAXParseException: <unknown>:1:1: not well-formed (invalid token)
Any reason the application engine parser, which also uses python2.5 and expat, will not work when entering unicode?