Get the contents of a list of span elements using HTMLUnit and XPath

I want to get a list of values ​​from an HTML document. I am using HTMLUnit.

There are many span elements in a class. I want to extract content in span tags:

<span class="topic"> <a href="http://website.com/page/2342" class="id-24223 topic-link J_onClick topic-info-hover">Lean Startup</a> </span> 

My code is as follows:

  List<?> topics = (List)page.getByXPath("//span[@class='topic']/text()"); 

However, whenever I try to iterate through a list, I get a NoSuchElementException . Can anyone see an obvious mistake? Links to good tutorials will also be appreciated.

+4
source share
2 answers

If you know that you will always have <a> , then just add it to XPath and then get text() from a .

If you really don't know if you always have a , then I would recommend using the .asText() method, which has all the HtmlElement and their descendants.

So, first get each of the spans:

 List<?> topics = (List)page.getByXPath("//span[@class='topic']"); 

And then, in a loop, get the text inside each of the intervals:

 topic.asText(); 
+1
source

text() will retrieve only text from this element, and this example you specified does not have a text component, but only a child element.

Try this instead:

 List<?> topics = (List)page.getByXPath("//span[@class='topic']"); 
0
source

Source: https://habr.com/ru/post/1486047/


All Articles