Best way to handle large XML in PHP

I need to parse large XML files in php, one of them is 6.5 MB, and they can be even larger. The SimpleXML extension that I read loads the entire file into an object, which may not be very efficient. In your experience, what would be the best way?

+24
xml php parsing large-files simplexml
Jul 22 '09 at 17:56
source share
7 answers

For a large file, you want to use the SAX parser , not the DOM parser.

Using the DOM analyzer, it will be read into the entire file and loaded into a tree of objects in memory. Using the SAX parser, it will sequentially read the file and call the user-defined callback functions to process the data (start tags, end tags, CDATA, etc.).

Using the SAX parser, you will need to maintain your state (for example, which tag you are currently using), which makes it a little more complicated, but for a large file it will be much more memory efficient.

+21
Jul 22 '09 at 17:58
source share

My occupation:

https://github.com/prewk/XmlStreamer

A simple class that will extract all children to the root XML element when streaming a file. Tested on 108 MB XML file with pubmed.com.

class SimpleXmlStreamer extends XmlStreamer { public function processNode($xmlString, $elementName, $nodeIndex) { $xml = simplexml_load_string($xmlString); // Do something with your SimpleXML object return true; } } $streamer = new SimpleXmlStreamer("myLargeXmlFile.xml"); $streamer->parse(); 
+11
Nov 23 2018-11-11T00:
source share

SAX Parser, as Eric Petroelje recommends, would be better for large XML files. The DOM parser is loaded into the entire XML file and allows you to run xpath requests - the SAX (Simple API for XML) parser will simply read one line at a time and give you capture points for processing.

+6
Jul 22 '09 at 18:14
source share

When using DOMDocument with large XML files, remember to pass the LIBXML_PARSEHUGE flag in the load() settings. (The same applies to other load methods of the DOMDocument object)

  $checkDom = new \DOMDocument('1.0', 'UTF-8'); $checkDom->load($filePath, LIBXML_PARSEHUGE); 

(Works with XML file 120mo)

+6
Jan 23 '14 at 17:24
source share

Does it really depend on what you want to do with the data? Do you need all this in your memory in order to work effectively with it?

6.5 MB is not so much, from the point of view of today's computers. You could, for example, ini_set('memory_limit', '128M');

However, if your data may be streaming, you may need to use SAX parser . It really depends on your usage needs.

+3
Jul 22 '09 at 18:00
source share

SAX parser is the way to go. I found that SAX parsing can get messy if you don't stay organized.

I use the STX (Streaming Transformations for XML) approach to parse large XML files. I use SAX methods to create a SimpleXML object to track data in the current context (i.e. only the nodes between the root and the current node). Other functions are then used to process the SimpleXML document.

+1
Jul 22 '09 at 18:26
source share

I needed to parse a large XML file that had an element in each line (StackOverflow data dump). In this particular case, it was enough to read the file one line at a time and parse each line using SimpleXML. For me, this had the advantage that you did not need to learn anything.

+1
Mar 10 '10 at 9:41
source share



All Articles