w...">

Why does xpath remove html special characters?

why this

$html = '<a href="/browse/product.do?cid=1&amp;vid=1&amp;pid=1" class="productItemName">what is going on here</a>'; $dom = new DOMDocument(); $dom->loadhtml($html); $xpath = new DOMXPath($dom); $selectors['link'] = '//a/@href'; $links_nodeList = $xpath->query($selectors['link']); foreach ($links_nodeList as $link) { $links[] = $link->nodeValue; } echo("<p>links</p>"); echo("<pre>"); print_r($links); echo("</pre>"); 

Output

 links Array ( [0] => /browse/product.do?cid=1&vid=1&pid=1 ) 

but not

 links Array ( [0] => /browse/product.do?cid=1&amp;vid=1&amp;pid=1 ) 

?

+4
source share
1 answer

The answer is simple :

&amp; is a special way to represent the "&" character in an XML document.

These two characters designate the same character .

When the escaped form of an ampersand is displayed as text (and not as XML), it means that it is "&" .

As @LarsH described in detail in his comment :

when you say loadhtml($html) you parse the string as HTML, which means that character objects (like &amp; ) are interpreted into the characters they represent (like & ). If you need a string that will be interpreted as &amp; you need to avoid an ampersand like &amp;amp;

+6
source

Source: https://habr.com/ru/post/1391484/


All Articles