Jsoup finds an element with specific text

I want to select an element with specific text from HTML using JSoup. Html

<td style="vertical-align:bottom;text-align:center;width:15%"> <div style="background-color:#FFDD93;font-size:10px;margin:5px auto 0px auto;text-align:left;" class="genbg"><span class="corners-top-subtab"><span></span></span> <div><b>Pantry/Catering</b> <div> <div style="color:#00700B;">&#10003;&nbsp;Pantry Car Avbl <br />&#10003;&nbsp;Catering Avbl</div> </div> <div> <div><span>Dinner is served after departure from NZM on 1st day.;</span>... <br /><a style="font-size:10px;color:Red;" onClick="expandPost($(this).parent());" href="javascript:void(0);">Read more...</a> </div> <div style="display:none;">Dinner :2 chapati, rice, dal and chicken curry (NV) and paneer curry in veg &amp;Ice cream.; Breakfast:2 bread slices with jam and butter. ; Omlet of 2 eggs (Non veg),vada and sambar(veg)..; coffee &amp; lime juice</div> </div> </div><span class="corners-bottom-subtab"><span></span></span> </div> 

I want to find a div element containing the text "Pantry / Catering". I tried

 doc.select("div:contains(Pantry/Catering)").first(); 

But this does not seem to work. How can I get this item using Jsoup?

+9
source share
3 answers

When I run my code, it selects the outer div , although I assume your search is the inner div . The documentation says that he selects "elements containing the specified text." In this simple html:

 <div><div><b>Pantry/Catering</b></div></div> 

The div:contains(Pantry/Catering) selector div:contains(Pantry/Catering) matches twice because both contain the text "Pantry / Catering":

 <!-- First Match --> <div><div><b>Pantry/Catering</b></div></div> <!-- Second Match --> <div><b>Pantry/Catering</b></div> 

Matches are always in that order, because jsoup matches from the side. Therefore .first() always matches the outer div . To extract the inner div , you can use .get(1) .

Extraction of the inner div in full:

 doc.select("div:contains(Pantry/Catering)").get(1) 
+13
source

Ok To guess. Will have to do something like

doc.select("b:contains(Pantry/Catering)").first().parent().children().get(1).text();

Thanks for the help!

+6
source

This should also do the job for you:

 doc.selectFirst("div:containsOwn(Pantry/Catering)").text(); 

Explanation:

selectFirst (selector) - helps to avoid using select (). first ()

containsOwn (text) is a pseudo-selector to return elements that directly contain the specified text. The text should appear in the element found, and not in any of its descendants, unlike contains (text).

Source: https://jsoup.org/apidocs/org/jsoup/select/Selector.html#selectFirst-java.lang.String-org.jsoup.nodes.Element-

0
source

Source: https://habr.com/ru/post/974401/


All Articles