Quick version:
What is the standard (innovative? Any?) Way of catching and handling errors generated by XMLReader due to a distorted file - in particular, unescaped characters. Prepossessing with Tidy (etc.) is not a super attractive option, does anyone know a way to just skip an abusive node and move right along?
Descriptive Version:
We all know that this is not XML, if it is not properly formed, but allows you to be honest - this happens. The client regularly downloads massive (50-100 MB +) xml files that must be read in mysql. XMLReader is an obvious choice, and we have written packaging that is well suited to our needs.
Sometimes an error occurs and read () fails to kill the import - drat! His almost always hopeless character (such as "&"), which spreads everything. In most cases, we just have to call the customer data provider and demand that they fix their defective file. Unfortunately, data providers are not always required and / or timely. It would be great if we could just catch the error and go straight to the next node.
I spent quite a lot of time trying to read / hack this one and cannot find anything worthy of attention. Am I missing something?
This SO question was indicative, but it simply did not produce any results. Transmission 1 seems like it should ask Reader to repair, but we just don't see any attempts / different error messages, etc. Here is the relevant code describing the approach:
$xml->open($file, null, LIBXML_NOERROR | LIBXML_NOWARNING | 1);
Tidy, .
"" , Read() try/catch node, . , Read() custom/wrapper, , , .
, : read() , ? , ( , , XMLReader )?
$xml = new XMLReader();
$xml->open($file);
while ($xml->read()) {
}