How to get text between a specific range using HtmlUnit

I am new to HtmlUnit and I'm not even sure if this is the right tool for my project. I am trying to parse a website and extract the values ​​I need from it. I need to get the value "07:05" from this,

<span class="tim tim-dep">07:05</span> 

I know that I can use getTextContent () to retrieve the value, but I do not know how I can select a specific range. I used getElementById to search

 <div> 

to which this expression applies, but when I get the text content of this div, I get a whole line of text with a lot of unnecessary data. Can someone tell me how I can select this expression, possibly using the class name?

+4
source share
2 answers

You need to view the page and interact with it, for example:

 final WebClient web = new HtmlUnit(); final HtmlPage page = web.getPage("http://www.whateveryouwant.com.br"); 

Get items by tag and iterate over it:

 final List<DomElement> spans = page.getElementTagName("span"); for (DomElement element : spans) { if (element.getAttribute("class").equals("tim tim-dep")) { return element.getNodeValue(); } } 

Or just use XPath:

 // Not sure what getFirstByXPath return DomElement element = page.getFirstByXPath("//span[@class='tim tim-dep']"); final String text = element.getNodeValue(); 
+8
source

here you go ..

 DomElement element = page.getFirstByXPath("//span[@class='tim tim-dep']"); String text = element.getTextContent(); 
0
source

Source: https://habr.com/ru/post/1479143/


All Articles