Select nodeValue, but exclude children.
Let's say I have this code:
<p dataname="description"> Hello this is a description. <a href="#">Click here for more.</a> </p> How to select nodeValue p but exclude a and its contents?
My current code is:
$result = $xpath->query("//p[@dataname='description'][not(self::a)]"); I select it $result->item(0)->nodeValue;
Not sure if PHP XPath supports this, but this XPath does the trick for me in Scrapy (a Python-based framework):
$xpath->query('//p[@dataname='description']/text()[following-sibling::a]') If this does not work, try the Kristoffers solution, or you can also use regex. For instance:
$output = preg_replace("~<.*?>.*?<.*?>~msi", '', $result->item(0)->nodeValue);
This will remove any HTML tag with any content in it, excluding text that is not encapsulated by HTML tags.