I have a problem with SAX and Java .
I am parsing the dblp digital library database XML file (which lists the journal, conferences, document). The XML file is very large (> 700 MB).
However, my problem is that when the characters () callback is returned , if the found string contains several objects, the method returns only the string, starting from the last found characters of the entities.
ie: Rüdiger Meckeis the original name of the author, enclosed between tags<author>
รผdiger Mecke - result
(The string is returned from the characters (ch [], start, length)).
I'd like to know:
- How to prevent PArser from automatically resolving entities?
- How to solve the truncated character problem described earlier?
source
share