Hello this is a description. Click her...">

Select nodeValue, but exclude children.

Let's say I have this code:

<p dataname="description"> Hello this is a description. <a href="#">Click here for more.</a> </p> 

How to select nodeValue p but exclude a and its contents?

My current code is:

 $result = $xpath->query("//p[@dataname='description'][not(self::a)]"); 

I select it $result->item(0)->nodeValue;

+6
source share
2 answers

Just adding / text () to your request should do the trick

 $result = $xpath->query("//p[@dataname='description'][not(self::a)]/text()"); 
+10
source

Not sure if PHP XPath supports this, but this XPath does the trick for me in Scrapy (a Python-based framework):

 $xpath->query('//p[@dataname='description']/text()[following-sibling::a]') 

If this does not work, try the Kristoffers solution, or you can also use regex. For instance:

$output = preg_replace("~<.*?>.*?<.*?>~msi", '', $result->item(0)->nodeValue);

This will remove any HTML tag with any content in it, excluding text that is not encapsulated by HTML tags.

+2
source

Source: https://habr.com/ru/post/907925/


All Articles