Select nodeValue, but exclude children.
Let's say I have this code:
<p dataname="description"> Hello this is a description. <a href="#">Click here for more.</a> </p>
How to select nodeValue p
but exclude a
and its contents?
My current code is:
$result = $xpath->query("//p[@dataname='description'][not(self::a)]");
I select it $result->item(0)->nodeValue;
Not sure if PHP XPath supports this, but this XPath does the trick for me in Scrapy (a Python-based framework):
$xpath->query('//p[@dataname='description']/text()[following-sibling::a]')
If this does not work, try the Kristoffers solution, or you can also use regex. For instance:
$output = preg_replace("~<.*?>.*?<.*?>~msi", '', $result->item(0)->nodeValue);
This will remove any HTML tag with any content in it, excluding text that is not encapsulated by HTML tags.