How to remove minor spaces from HTML

I need to compare different versions of HTML pages to format and change text. Unfortunately, the guy / company they created uses some kind of HTML editor that rewinds each HTML code each time (and adds a lot of spaces), which makes it difficult to distinguish between them. So I'm looking for a tool (preferably a Java library) that can reformat my HTML so that all minor spaces and newlines are removed.

This means that in

<h1>First Headline</h1> <h2>Second headline</h2>

the space between </h1>and <h2>should be removed but in

<b>formatted</b> <i>text</i>

spaces cannot be removed. I don't need blocks <pre>, <textarea>or <script>, and also not CSS scroll attributes that can change behavior. I'm just looking for a solution that removes most of the unnecessary spaces (and it’s better to leave too many spaces than too few).

(I already fold a few spaces and again add new lines instead of spaces before the tags to make the text more readable, but there are still too many cases where, for example, a new line of a new line between the headers or cells of the table / lines breaks my simple “solution” .)

+3
source share
2 answers

JTidy . HTML-, HTML ( HTML) HTML DOM, , , .

+6

Source: https://habr.com/ru/post/1726610/


All Articles