Can SAX Parsers use XPath in Java?

I'm trying to port one of my classes that uses DOM parsing with lots of XPath expressions to parse SAX . DOM Analysis was useful to me, but some of the files I am trying to analyze are too large, and they cause server timeouts. I want to reuse XPath using SAX parsing, but I'm not sure if this is possible, and if it is not possible, you can help me because I donโ€™t know what the following code will look like when I use only SAX :

 Document doc = bpsXml.getDocument(); String supplierName = BPSXMLUtils.getXpathString(doc, "/Invoice/InvoiceHeader/Party[@stdValue='SU']/Name/Name1"); String language = BPSXMLUtils.getXpathString(doc, "/Invoice/InvoiceHeader/InvoiceLanguage/@stdValue"); 
+4
source share
3 answers

Just using the SAX parser will not create a representation of your XML tree in memory (which is why SAX is more memory efficient). This will only trigger "events" whenever a new XML element is encountered. You will need to keep the context (often the stack of parent elements) in memory to โ€œknowโ€ where you are in the tree.

Since you will not have a tree in memory, you cannot use XPath. You can only check the current "context" (your manuallay managed stack) to request your document. Remember that the SAX parser will only execute one run in your file, so the order in the file is important.

Fortunately, there is another approach, for example, VTD-XML , which is a library that creates an XML tree in memory, but only part of the structure, it does not extract the actual content from the file, the content is extracted as necessary. This is much more memory efficient than the DOM parser, but XPath does. I personally use this library at work to parse ~ 700 MB of XML files with XPath (yes, it's insane, but it works, and it's very fast.)

+4
source

IMHO the easiest way to process XML is to use StAX , the Streaming API for XML. It combines the benefits of DOM and SAX (and offers you easier migration). You still have a pointer to an XML element (for example, in SAX), but your code moves the cursor forward. This gives a big advantage in that the XML processing code is much more readable. It also solves the memory problem, since only the current XML element should be stored in memory. Here is also a good tutorial .

To answer your initial question: a short search on Google showed me that there is no simple, generally accepted way, which probably means that all user solutions are not reliable, not supported and not tested.

+1
source

Switching to SAX (or StAX) parsing will require a complete change in your approach. It seems that you did not fully appreciate how much work will be. For any advice that makes sense, we need to know how big the file is and what processing you want to do with the data. For example, if you are filtering data, then an XQuery implementation using document projection might be a good answer (this will automatically use SAX behind the scenes to build a tree containing only a subset of the data that you are really interested in).

0
source

Source: https://habr.com/ru/post/988690/


All Articles