Reading in Malformed XML (Uncoded XML Objects) with PHP

I am having trouble parsing invalid XML in PHP. In particular, I turn to a third-party web service that returns data in XML format without encoding XML objects in real data. For example, one of the elements contains an ASCII heart, "<3", without quotes, which the XML parser considers as an opening tag. It should be "& lt; 3".

Right now, I'm just passing an XML string to SimpleXMLElement, which, as expected, fails in these cases. I looked around a bit and it seems that the PHP Tidy package can help me, but the amount of configuration you can do is huge :(

So, I’m just wondering if anyone else has such a problem, and if so, how did they manage to solve it.

Thanks!

+3
source share
2 answers

Try tidy.repairString :

php > $tidy = new tidy();
php > $repaired = $tidy->repairString("<foo>I <3 Philadelphia</foo>", array("input-xml"=>1));
php > print($repaired);
<foo>I &lt;3 Philadelphia</foo>
php > $el = new SimpleXMLElement($repaired);
+5
source
  • Read the contents as a string.
  • htmlspecialchars(preg_replace('/[\x-\x8\xb-\xc\xe-\x1f]/','',$string))
  • Load the converted string to SimpleXMLElement

He worked for me so far.

-1
source

Source: https://habr.com/ru/post/1711295/


All Articles