...">

Avoid removing spaces and newlines when parsing html with jsoup

I have a sample code as below.

String sample = "<html>
<head>
</head>
<body>
This is a sample on              parsing html body using jsoup
This is a sample on              parsing html body using jsoup
</body>
</html>";

Document doc = Jsoup.parse(sample);
String output = doc.body().text();

I get output as

This is a sample on parsing html body using jsoup This is a sample on `parsing html body using jsoup`

But I want the result to be

This is a sample on              parsing html body using jsoup
This is a sample on              parsing html body using jsoup

How to do parsing to get this result? Or is there any other way to do this in Java?

+4
source share
2 answers

You can turn off the beautiful printing of your document to get the result as you want it. But you must also change .text()to .html().

Document doc = Jsoup.parse(sample);
doc.outputSettings(new Document.OutputSettings().prettyPrint(false));
String output = doc.body().html();
+3
source

HTML , . .

, . , ( ), . , , , , ( , , doc.body(). Text()).

0

Source: https://habr.com/ru/post/1659698/


All Articles