Problem extracting jsoup tag

Question

Problem extracting jsoup tag

Test: example
test1: example1

 Elements size = doc.select("div:contains(test:)");

how can i extract example value and example1 from this html tag .... using jsoup ..

0

java jsoup

suraa Aug 6 '10 at 5:41

source share

1 answer

BalusC · Answer 1 · 2010-08-11T21:49:15+0000

Since this HTML is not semantic enough for the final purpose, you have (a <br>cannot have children, but :not HTML), you cannot do much with an HTML parser, such as Jsoup. The HTML parser does not intend to do work on specific text extraction / tokenization.

The best you can do is to get the HTML content <div>using Jsoup and then extract it using regular methods java.lang.Stringor maybe java.util.Scanner.

Here is an example run:

String html = "<div style=\"height:240px;\"><br>test: example<br>test1:example1</div>";
Document document = Jsoup.parse(html);
Element div = document.select("div[style=height:240px;]").first();

String[] parts = div.html().split("<br />"); // Jsoup transforms <br> to <br />.
for (String part : parts) {
    int colon = part.indexOf(':');
    if (colon > -1) {
        System.out.println(part.substring(colon + 1).trim());
    }
}

example
example1

HTML, . .

<dl id="mydl">
     <dt>test:</dt><dd>example</dd>
     <dt>test1:</dt><dd>example1</dd>
</dl>

, , :

String html = "<dl id=\"mydl\"><dt>test:</dt><dd>example</dd><dt>test1:</dt><dd>example1</dd></dl>";
Document document = Jsoup.parse(html);
Elements dts = document.select("#mydl dd");
for (Element dt : dts) {
    System.out.println(dt.text());
}

Problem extracting jsoup tag

More articles: