If performance is an important factor and / or the size of the document is large (both of which appear to be here), the difference between an event parser (e.g., SAX or StAX) and native implementation of Java XPath is that the latter builds a D3C DOM document before evaluation XPath expressions. [It is interesting to note that all implementations of the Java Object Object Model, such as the DOM or Axiom, use an event processor (for example, SAX or StAX) to create a view in memory, so if you can ever manage with an event processor only, save it as memory , and the time required to create the DOM.]
As I mentioned, the XPath implementation in the JDK works with the W3C DOM document. You can see this in the Java JDK source code implementation by looking at com.sun.org.apache.xpath.internal.jaxp.XPathImpl , where before calling the evaluation method (), the parser must first analyze the source:
Document document = getParser().parse( source );
After that, your 10GB XML will be presented in memory (plus any overhead) - probably not what you want. Although you might need a more βgeneral" solution, both your XPath example and your XML markup seem relatively simple, so it seems to be not a very strong excuse for XPath (with the possible exception of programming elegance). The same would be true for the XProc proposal: it would also create a DOM. If you really need a DOM, you can use Axiom, not the W3C DOM. Axiom has a much friendlier API and builds its DOM on top of StAX, so it works fast and uses Jaxen to implement XPath. Jaxen requires a kind of DOM (W3C DOM, DOM4J or JDOM). This will be true for all XPath implementations, so if you really don't want XPath to adhere only to the event parser, it is recommended.
SAX is the old streaming API, with the new StAX and much faster. Either using the StAX JDK built-in implementation ( javax.xml.stream ), or the StAX implementation Woodstox (which is much faster in my experience), I would recommend creating an XML Event Filter that first matches the element type name (to capture your <txn> elements <txn> ). This will create small event packages (element, attribute, text) that can be checked for compliance with your custom values. With a suitable match, you can either pull the necessary information from the events, or bind the restricted events to build a mini-DOM from them, if you find that the result was easier to navigate. But it looks like this might be redundant if the markup is simple.
This is most likely the easiest and fastest approach and avoid the overhead of memory for creating the DOM. If you passed the element and attribute names to the filter (so that your matching algorithm is customizable), you could make it relatively general.