This is a common problem with the DOM: you need to do a bit more work if you want to get the contents of the tag and the contents of all its children.
Basically, you need to iterate over the child nodes of the one you mapped to your XPath query to get their contents.
There is one solution suggested in one of the user's notes on the DOMElement class manual page - see this note .
Integrating this solution into the code that you already have should give you something similar to declaring an HTML line with sub-labels:
$html = <<<HTML <div class="main"> <div class="text"> <p> Capture this <strong>text</strong> <em>1</em> </p> <p> And some other <strong>text</strong> </p> </div> </div> HTML;
And, to extract data from this HTML string, you can use something like this:
$dom = new DOMDocument(); $dom->loadHTML($html); $xpath = new DOMXPath($dom); $tags = $xpath->query('//div[@class="main"]/div[@class="text"]'); foreach ($tags as $tag) { $innerHTML = '';
The only thing that has changed is the contents of the foreach : instead of using $tag->nodeValue you need to $tag->nodeValue over the children.
This gives me the following result:
string '<p> Capture this <strong>text</strong> <em>1</em> </p> <p> And some other <strong>text</strong> </p>' (length=150)
What is the full content of the <div> tag that was matched, and all its children, including tags.
Note: in the notes of users of the manual there are often interesting ideas and solutions; -)
source share