<...">

JSoup extracts text from a td table that does not contain any html nodes

I have an html line, for example:

String html="<table><tbody>
<tr>
<td><p>ABC</p></td>
<td>DEF</td>
</tr>
<tr>
<td><p>GHI</p></td>
<td>MNO</td>
</tr>
</tbody>
</table>";

I only need to extract text that no longer has children inside td tags . My current code returns me both text and html nodes.

Elements elements = doc.select("tbody > tr");
for (Element e : elements) {
    System.out.println(e.select("td").html());
}

But what I need is:

DEF
MNO

Thanks in advance.

+4
source share
3 answers

Try this CSS selector:

tbody > tr > td:not(:has(*))

Demo

http://try.jsoup.org/~K4qiK0SxQDeuhE9FvvmUDa3vKKI

DESCRIPTION

tbody  /* Select any tbody */
> tr   /* Select any tr directly under it */
> td   /* Select any td directly under it ... */
:not(:has(*)) /* ... not having any element */

An operator *matches only elements . The text node is not an element. This is just a view of Node.

SAMPLE CODE

Elements elements = doc.select("tbody > tr > td:not(:has(*))");
for (Element e : elements) {
    System.out.println(e.select("td").html());
}

OUTPUT

<td>DEF</td>
<td>MNO</td>
+2
source

, td, td, tds, . , , .

String html="<table><tbody>"
        +"<tr>"
        +"<td><p>ABC</p></td>"
        +"<td>DEF</td>"
        +"<td>DEF2<p>ABC</p></td>"
        +"</tr>"
        +"<tr>"
        +"<td><p>GHI</p></td>"
        +"<td>MNO</td>"
        +"<td>MNO2<p>GHI2</p></td>"
        +"</tr>"
        +"</tbody>"
        +"</table>";

Document doc = Jsoup.parse(html);
Elements elements = doc.select("tbody > tr > td:matchesOwn(.+)");
for (Element e : elements) {
    System.out.println(e.text());
}

td, , .. .+ ( ).

tds, , :

Document doc = Jsoup.parse(html);
Elements elements = doc.select("tbody > tr > td:matchesOwn(.+):not(:has(*))");
for (Element e : elements) {
    System.out.println(e.text());
}

:has(), :not(), JSOUP Docs

+2

Element.child(int index) index = 0.

Elements elements = doc.select("tbody > tr");
for (Element e : elements) {
    for (Element el : e.select("td")) {
        // el.child(0)
    }
}
0

Source: https://habr.com/ru/post/1625095/


All Articles