How to load HTMLFile () with an error with the error "htmlParseEntityRef: no name"?

Question

How to load HTMLFile () with an error with the error "htmlParseEntityRef: no name"?

I am trying to get the string "hinson lou ann" from:

<div class='owner-name'>hinson lou ann</div>

When I run the following:

 $html = "http://gisapps.co.union.nc.us/ws/rest/v2/cm_iw.ashx?gid=12339"; $doc = new DOMDocument(); $doc->loadHTMLFile($html); $xpath = new DOMXpath($doc); $elements = $xpath->query("*/div[@class='owner-name']"); if (!is_null($elements)) { foreach ($elements as $element) { echo "<br/>[" . $element->nodeName . "]"; $nodes = $element->childNodes; foreach ($nodes as $node) { echo $node->nodeValue . "\n"; } } }

I get an error message:

Warning: DOMDocument :: loadHTMLFile () [domdocument.loadhtmlfile]: htmlParseEntityRef: no name at http://gisapps.co.union.nc.us/ws/rest/v2/cm_iw.ashx?gid=12339 , line: 1 to / home ... on the line ...

What about the line loadHTMLFILE .

Note. The file is not valid HTML, it contains only div tags! What did I upload the file and then hit the HTML body tag on it?

+4

dom html php xpath

Josh cox Jun 27 '13 at 20:31

source share

4 answers

Just create an HTML document from the source, wrapping it in the missing elements, do the trick.

For instance: -

 <?php $html = file_get_contents('http://gisapps.co.union.nc.us/ws/rest/v2/cm_iw.ashx?gid=12339'); $html = sprintf('<html><head><title></title></head><body>%s</body></html>', $html); $doc = new DOMDocument; $doc->loadHTML($html); $xpa = new DOMXPath($doc); $divs = $xpa->query('//div[@class="owner-name"]'); foreach($divs as $div) { echo $div->nodeValue, PHP_EOL; } /* hinson lou ann */

+3

Anthony sterling Jun 27 '13 at 20:49

source share

You get an error because the HTML you are loading contains the & character, & is not a valid HTML object. Object Name Flashes:

 ... <td>HINSON J MARK & WF LOU ANN G</td> ... ^

When loading such documents, you will see an error in these cases (as you wrote):

Warning: DOMDocument :: loadHTMLFile (): htmlParseEntityRef: no name

name refers to the HTML Entity name (link) by template:

 &name; ^^^^

However, this error does not cause any problems with the actual loading of this HTML. DOMDocument handles this (general) error perfectly (you may encounter a disconnect in a problematic position ).

So, your assumption that you need to wrap this file in a <body> is incorrect. In HTML, the <body> tag is optional.

Your specific problem was that you could not figure out how to debug the HTML file after it was downloaded. Just use the saveHTML method to output what can be loaded successfully. This would already show you that the URL was loaded successfully.

Which would then lead you to the next point that the Xpath expression was incorrect:

 */div[@class='owner-name']

Although your nose about the <body> was not that far: even this HTML snippet does not contain the <body> , the DOM will receive it! Although these are two tags inside:

 body/*/*/div[@class='owner-name']

Most often, the short form is to use // , which allows you to not specify specifically at what level the depth of the tag is:

 //div[@class='owner-name']

See also:

+3

hakre Jun 29 '13 at 9:56

source share

The remote site may return invalid HTML that triggers this warning. DOMDocument and DOMXPath very simple in case of HTML errors. If after calling DOMDocument::loadHTML() there is only a warning, and the rest of the code gives reliable results, I would advise you to suppress the warnings with the silence operator @ :

 $doc = new DOMDocument(); // suppress warnings $ret = @$doc->loadHTML($html); // but check errors ... if($ret === FALSE) { die('Parse error'); }

+1

hek2mgl Jun 27 '13 at 20:38

source share

Half crazed · Accepted Answer · 2013-06-27T20:45:07+0000

If you really need to try to parse it, try the following:

 <?php $html = file_get_contents("http://gisapps.co.union.nc.us/ws/rest/v2/cm_iw.ashx?gid=12339"); $doc = new DOMDocument(); $doc->strictErrorChecking = false; $doc->recover=true; @$doc->loadHTML("<html><body>".$html."</body></html>"); $xpath = new DOMXpath($doc); $elements = $xpath->query("//*/div[@class='owner-name']"); if (!is_null($elements)) { foreach ($elements as $element) { echo "<br/>[". $element->nodeName. "]"; $nodes = $element->childNodes; foreach ($nodes as $node) { echo $node->nodeValue. "\n"; } } } ?>

PS: Your XPath was wrong, I fixed it. Your $nodes have nothing, because this DIV element ( .owner-name ) has no children .. so you will need to revise it.

How to load HTMLFile () with an error with the error "htmlParseEntityRef: no name"?

More articles: