Best way to handle large XML in PHP

Question

Best way to handle large XML in PHP

I need to parse large XML files in php, one of them is 6.5 MB, and they can be even larger. The SimpleXML extension that I read loads the entire file into an object, which may not be very efficient. In your experience, what would be the best way?

+24

xml php parsing large-files simplexml

Petruza Jul 22 '09 at 17:56

source share

7 answers

My occupation:

https://github.com/prewk/XmlStreamer

A simple class that will extract all children to the root XML element when streaming a file. Tested on 108 MB XML file with pubmed.com.

class SimpleXmlStreamer extends XmlStreamer { public function processNode($xmlString, $elementName, $nodeIndex) { $xml = simplexml_load_string($xmlString); // Do something with your SimpleXML object return true; } } $streamer = new SimpleXmlStreamer("myLargeXmlFile.xml"); $streamer->parse();

+11

oskarth Nov 23 2018-11-11T00:

source share

SAX Parser, as Eric Petroelje recommends, would be better for large XML files. The DOM parser is loaded into the entire XML file and allows you to run xpath requests - the SAX (Simple API for XML) parser will simply read one line at a time and give you capture points for processing.

SAX Example: http://www.codemiles.com/php-tutorials/php-sax-parser-in-action-t1436.html

+6

kenleycapps Jul 22 '09 at 18:14

source share

When using DOMDocument with large XML files, remember to pass the LIBXML_PARSEHUGE flag in the load() settings. (The same applies to other load methods of the DOMDocument object)

  $checkDom = new \DOMDocument('1.0', 'UTF-8'); $checkDom->load($filePath, LIBXML_PARSEHUGE);

(Works with XML file 120mo)

+6

COil Jan 23 '14 at 17:24

source share

Does it really depend on what you want to do with the data? Do you need all this in your memory in order to work effectively with it?

6.5 MB is not so much, from the point of view of today's computers. You could, for example, ini_set('memory_limit', '128M');

However, if your data may be streaming, you may need to use SAX parser . It really depends on your usage needs.

+3

gahooa Jul 22 '09 at 18:00

source share

SAX parser is the way to go. I found that SAX parsing can get messy if you don't stay organized.

I use the STX (Streaming Transformations for XML) approach to parse large XML files. I use SAX methods to create a SimpleXML object to track data in the current context (i.e. only the nodes between the root and the current node). Other functions are then used to process the SimpleXML document.

+1

Benedict Cohen Jul 22 '09 at 18:26

source share

I needed to parse a large XML file that had an element in each line (StackOverflow data dump). In this particular case, it was enough to read the file one line at a time and parse each line using SimpleXML. For me, this had the advantage that you did not need to learn anything.

+1

Liam Mar 10 '10 at 9:41

source share

Eric Petroelje · Accepted Answer · 2009-07-22 17:58

For a large file, you want to use the SAX parser , not the DOM parser.

Using the DOM analyzer, it will be read into the entire file and loaded into a tree of objects in memory. Using the SAX parser, it will sequentially read the file and call the user-defined callback functions to process the data (start tags, end tags, CDATA, etc.).

Using the SAX parser, you will need to maintain your state (for example, which tag you are currently using), which makes it a little more complicated, but for a large file it will be much more memory efficient.

Best way to handle large XML in PHP

More articles: