Is there a way to accurately collect byte offsets of xml tags using XMLStreamReader?
I have a large XML file for which I need random access. Instead of writing all this to a database, I would like to skip it once with XMLStreamReader to collect byte offsets of meaningful tags, and then be able to use RandomAccessFile to retrieve the contents of the tag later.
XMLStreamReader does not seem to be able to track character offsets. Instead, people recommend attaching an XmlStreamReader to a reader that keeps track of how many bytes have been read (e.g. CountingInputStream provided by apache.commons.io)
eg:
CountingInputStream countingReader = new CountingInputStream(new FileInputStream(xmlFile)) ;
XMLStreamReader xmlStreamReader = xmlStreamFactory.createXMLStreamReader(countingReader, "UTF-8") ;
while (xmlStreamReader.hasNext()) {
int eventCode = xmlStreamReader.next();
switch (eventCode) {
case XMLStreamReader.END_ELEMENT :
System.out.println(xmlStreamReader.getLocalName() + " @" + countingReader.getByteCount()) ;
}
}
xmlStreamReader.close();
, , . xml ( , XML)?