Performance Difference Between Stax Session and DOM

I have been using the DOM for a long time, and so the DOM parsing was pretty good. Even with XML about 4-7 MB in size, parsing was quick. The problem we are facing with the DOM is the amount of memory that becomes huge as soon as we start working with large XML documents.

Recently, I tried moving to Stax (stream parsers for XML), which are supposed to be the second generation parser (reading about Stax, which it called the fastest parser). When I tried the Stax parser for large XML for about 4 MB of memory, certainty was drastically reduced, but the time to parse all XML and create a Java object from it increased almost 5 times compared to the DOM.

I used the sjsxp.jar implementation for Stax.

To some extent, I can logically prove that performance may not be very good due to the streaming nature of the parser, but it is 5 times reduced (for example, the DOM takes about 8 seconds to create an object for this XML, while Stax parsing took about 40 seconds on average) will definitely not be acceptable.

I missed some point here completely, since I cannot put up with these performance numbers

+4
source share
4 answers
package parsers; /** * * @author Arthur Kushman */ import java.io.File; import java.io.IOException; import javax.xml.parsers.DocumentBuilder; import javax.xml.parsers.DocumentBuilderFactory; import org.w3c.dom.Document; import org.w3c.dom.Node; import org.w3c.dom.NodeList; import org.w3c.dom.Element; public class DOMTest { public static void main(String[] args) { long time1 = System.currentTimeMillis(); try { DocumentBuilderFactory dbf = DocumentBuilderFactory.newInstance(); DocumentBuilder db = dbf.newDocumentBuilder(); Document doc = db.parse(new File("/Users/macpro/Desktop/myxml.xml")); doc.getDocumentElement().normalize(); // System.out.println("Root Element: "+doc.getDocumentElement().getNodeName()); NodeList nodeList = doc.getElementsByTagName("input"); // System.out.println("Information of all elements in input"); for (int s=0;s<nodeList.getLength();s++) { Node firstNode = nodeList.item(s); if (firstNode.getNodeType() == Node.ELEMENT_NODE) { Element firstElement = (Element)firstNode; NodeList firstNameElementList = firstElement.getElementsByTagName("href"); Element firstNameElement = (Element)firstNameElementList.item(0); NodeList firstName = firstNameElement.getChildNodes(); System.out.println("First Name: "+((Node)firstName.item(s)).getNodeValue()); } } } catch (Exception ex) { System.out.println(ex.getMessage()); System.exit(1); } long time2 = System.currentTimeMillis() - time1; System.out.println(time2); } } 

45 mills

 package parsers; /** * * @author Arthur Kushman */ import javax.xml.stream.*; import java.io.*; import javax.xml.namespace.QName; public class StAXTest { public static void main(String[] args) throws Exception { long time1 = System.currentTimeMillis(); XMLInputFactory factory = XMLInputFactory.newInstance(); // factory.setXMLReporter(myXMLReporter); XMLStreamReader reader = factory.createXMLStreamReader( new FileInputStream( new File("/Users/macpro/Desktop/myxml.xml"))); /*String encoding = reader.getEncoding(); System.out.println("Encoding: "+encoding); while (reader.hasNext()) { int event = reader.next(); if (event == XMLStreamConstants.START_ELEMENT) { QName element = reader.getName(); // String text = reader.getText(); System.out.println("Element: "+element); // while (event != XMLStreamConstants.END_ELEMENT) { System.out.println("Text: "+reader.getLocalName()); // } } }*/ try { int inElement = 0; for (int event = reader.next();event != XMLStreamConstants.END_DOCUMENT; event = reader.next()) { switch (event) { case XMLStreamConstants.START_ELEMENT: if (isElement(reader.getLocalName(), "href")) { inElement++; } break; case XMLStreamConstants.END_ELEMENT: if (isElement(reader.getLocalName(), "href")) { inElement--; if (inElement == 0) System.out.println(); } break; case XMLStreamConstants.CHARACTERS: if (inElement>0) System.out.println(reader.getText()); break; case XMLStreamConstants.CDATA: if (inElement>0) System.out.println(reader.getText()); break; } } reader.close(); } catch (XMLStreamException ex) { System.out.println(ex.getMessage()); System.exit(1); } // System.out.println(System.currentTimeMillis()); long time2 = System.currentTimeMillis() - time1; System.out.println(time2); } public static boolean isElement(String name, String element) { if (name.equals(element)) return true; return false; } } 

23 mills

StAX wins =)

+6
source

Although some details are missing on this question, I'm sure the answer is that it doesn't understand the slow one anyway (the DOM is not a parser; DOM trees are usually created using SAX or Stax parsers), but the code above it creates objects.

There are efficient automatic middleware, including JAXB (and with the appropriate settings, XStream) that can help. They are faster than the DOM, because the main problem with the DOM (and JDOM, Dom4j, and XOM) is that tree models are expensive in comparison to POJOs - they are mostly distinguished by HashMaps, with many pointers for convenient untyped traversal; especially regarding memory usage.

Regarding parsers, Woodstox is the faster Stax parser, which is Sjsxp; and Aalto is even faster if the original speed has an entity. But I doubt that the main problem is the parser speed.

+1
source

The classic case of speed / memory exchange in my humble opinion. There is not much you can do but try SAX (or JDOM) and measure again.

0
source

Try creating XML with 2000M, and then compare the numbers. I think that a DOM-based approach will work faster on smaller data. Stax (or any saxophone-based approach) will have an option as the data grows.

(We are dealing with 3G or large files .. The DOM does not even launch the application.)

0
source

Source: https://habr.com/ru/post/1305184/


All Articles