How to parse a new line from HTML using Jsoup

When I parse an HTML file using jsoup, texts in several lines (with <br />) in the HTML file are presented as a single line without new lines ( \n ). How can I parse a multi-line HTML document as multi-line strings?

I use the method: Element.text()

For instance:

HTML contains C code that displays correctly in several lines in an HtMl file, but when I take text data, all data is presented on one line without new string characters.

+4
source share
3 answers

Replace <br /> with something else and vice versa, for example:

 Document doc = Jsoup.connect("http://www.ejemplo.html").get(); //Here included the <br>'s String temp = doc.html().replace("<br />", "$$$"); //$$$ instead <br> doc = Jsoup.parse(temp); //Parse again String text = doc.body().text().replace("$$$", "\n").toString()); //example //I get back the new lines (\n) 
+3
source

The text() method of the Element (and TextNode ) TextNode calls appendWhitespaceIfBr(...) , which will replace each <br /> (or space) with a space. Unfortunately, I don’t see a mechanism to disable this mode without working on the code.

But maybe you can try replacing all the <br /> tags with a new subclass of Node .

0
source

See my answer to a similar question here: fooobar.com/questions/76182 / ...

Has an example of a static recursive method that will do what you ask.

0
source

Source: https://habr.com/ru/post/1447306/


All Articles