DomCrawler Symfony: how to get content from node excluding children?

Question

DomCrawler Symfony: how to get content from node excluding children?

Suppose I have an html page:

<html> <head></head> <body> Hello World! <div> my other content </div> </body> </html>

How do I get "Hello World" from the Crawler DOM?

I thought this would work:

 $crawler = $crawler ->filter('body > div'); ->reduce(function (Crawler $node, $i) { return false; });

But this will obviously give an error:

 InvalidArgumentException: "The current node list is empty"

+6

symfony web-crawler

apfz Aug 25 '14 at 11:28

source share

1 answer

Igor Pantović · Accepted Answer · 2014-08-25T17:11:41+0000

I don’t know if this can be done easier, but you can extract the text node using XPath:

 $crawler->filterXPath('//body/text()')->text();

The result will be a string containing Hello World , and empty spaces before and after the text before the first tag. Therefore, if you only need text, you can trim the value:

 $helloWorld = trim($crawler->filterXPath('//body/text()')->text());

This will work in your case if you have several text nodes in the body, for example:

 <html> <head></head> <body> Hello World! <div> my other content </div> Some other text </body> </html>

You can do:

 $crawler->filterXPath('//body/text()')->extract(['_text']));

This will return an array:

 Array ( [0] => Hello World! [1] => Some other text )

DomCrawler Symfony: how to get content from node excluding children?

More articles: