How to extract text ONLY from html file using jsoup

I used this code:

String innerHtml = Jsoup.parse(htmlCode,"ISO-8859-1").select("body").html(); 

But it only removes the <html> tags

All HTML tags inside the body will be displayed.

+4
source share
2 answers

Use .text() instead of .html() to get the combined text of the element and all its children.

+5
source

Try using .text() :

 Jsoup.parse(htmlCode,"ISO-8859-1").select("body").text(); 

Instead of .html() .

+5
source

Source: https://habr.com/ru/post/1469557/


All Articles