Load DOMDocument using HTML special characters (php)

I have problem downloading xml file with php. I use DOMDocument because I need a function getElementsByTagName.
I am using this code.


$dom = new DomDocument('1.0', 'UTF-8');
$dom->resolveExternals = false;
$dom->load($_FILES["file"]["tmp_name"]);

>

<?xml version="1.0" encoding="UTF-8"?>
<Data>
  <value>1796563</value>
  <value>Verliebt! &rsquo;</value>
</Data>
>

ErrorMessage:
Warning: DOMDocument :: load () [domdocument.load]: Entity 'rsquo' is not defined in / tmp / php 1VRb3N, line: 4 in /www/htdocs/bla/upload.php on line 51

+3
source share
3 answers

Your XML parser is not lying. This is an invalid (not even correct) document that you cannot load with anything.

rsquo - HTML, XML. XML-, -, (amp, lt, gt, quot apos), DTD, <!DOCTYPE>. ( XHTML.)

, , , , XML. (&#8217;) ’ UTF-8.

, , :

$xml= file_get_contents($_FILES['file']['tmp_name']);
$xml= str_replace('&rsquo;', '&#8217;', $xml);
$dom->loadXML(xml);

XML- XML, rsquo, . :

function only_html_entity_decode($match) {
    if (in_array($match[1], array('amp', 'lt', 'gt', 'quot', 'apos')))
        return $match[0];
    else
        return html_entity_decode($match[0], ENT_COMPAT, 'UTF-8');
}
$xml= preg_replace_callback('/&(\w+);/', 'only_html_entity_decode', $xml);

, &\w+; , , CDATA PI, . , , , , .

, , , html_entity_decode , XML, , &amp; &lt;.

, -, loadHTML().

+1

Entity, DTD. XML. DTD, XML DOM:

$dom->load(
    html_entity_decode(
        file_get_contents($_FILES["file"]["tmp_name"]), 
        ENT_COMPAT, 'UTF-8'));
+2

bobince:

    $xml= file_get_contents($_FILES['file']['tmp_name']);
    $xml= preg_replace('/&(\w+);/', '', $xml);
    $dom = new DomDocument();
    $dom->loadXML($xml);
0

Source: https://habr.com/ru/post/1768705/


All Articles