DomCrawler Symfony: how to get content from node excluding children?

Suppose I have an html page:

<html> <head></head> <body> Hello World! <div> my other content </div> </body> </html> 

How do I get "Hello World" from the Crawler DOM?

I thought this would work:

 $crawler = $crawler ->filter('body > div'); ->reduce(function (Crawler $node, $i) { return false; }); 

But this will obviously give an error:

 InvalidArgumentException: "The current node list is empty" 
+6
source share
1 answer

I don’t know if this can be done easier, but you can extract the text node using XPath:

 $crawler->filterXPath('//body/text()')->text(); 

The result will be a string containing Hello World , and empty spaces before and after the text before the first tag. Therefore, if you only need text, you can trim the value:

 $helloWorld = trim($crawler->filterXPath('//body/text()')->text()); 

This will work in your case if you have several text nodes in the body, for example:

 <html> <head></head> <body> Hello World! <div> my other content </div> Some other text </body> </html> 

You can do:

 $crawler->filterXPath('//body/text()')->extract(['_text'])); 

This will return an array:

 Array ( [0] => Hello World! [1] => Some other text ) 
+9
source

Source: https://habr.com/ru/post/974303/


All Articles