Scenario: I get a huge xml file through an extremely slow network, so I want to start over processing as early as possible. Because of this, I decided to use SAXParser.
I expected that upon completion of the tag I would receive an event.
The following test shows what I mean:
@Test public void sax_parser_read_much_things_before_returning_events() throws Exception{ String xml = "<a>" + " <b>..</b>" + " <c>..</c>" // much more ... + "</a>"; // wrapper to show what is read InputStream is = new InputStream() { InputStream is = new ByteArrayInputStream(xml.getBytes()); @Override public int read() throws IOException { int val = is.read(); System.out.print((char) val); return val; } }; SAXParser parser = SAXParserFactory.newInstance().newSAXParser(); parser.parse(is, new DefaultHandler(){ @Override public void startElement(String uri, String localName, String qName, Attributes attributes) throws SAXException { System.out.print("\nHandler start: " + qName); } @Override public void endElement(String uri, String localName, String qName) throws SAXException { System.out.print("\nHandler end: " + qName); } }); }
I wrapped the input stream to see what is being read and when the events occur.
I was expecting something like this:
<a> <- output from read() Handler start: a <b> <- output from read() Handler start: b </b> <- output from read() Handler end: b ...
Unfortunately, the result was as follows:
<a> <b>..</b> <c>..</c></a> <- output from read() Handler start: a Handler start: b Handler end: b Handler start: c Handler end: c Handler end: a
Where is my mistake and how can I get the expected result?
Edit:
- First of all, he tries to detect a version of doc that makes it scan everything. With the doc version, it is torn between them (but not where I expect)
- It is not good that he "wants" to read, for example, 1000 bytes and blocks for so long, because it is possible that this stream does not contain so much at a given time.
- I found buffer sizes in XMLEntityManager:
- public static final int DEFAULT_BUFFER_SIZE = 8192;
- public static final int DEFAULT_XMLDECL_BUFFER_SIZE = 64;
- public static final int DEFAULT_INTERNAL_BUFFER_SIZE = 1024;
source share