I am using JTidy v. r938. I use this code to try to clear the page ...
final Tidy tidy = new Tidy(); tidy.setQuiet(false); tidy.setShowWarnings(true); tidy.setShowErrors(0); tidy.setMakeClean(true); Document document = tidy.parseDOM(conn.getInputStream(), null);
But when I parse this URL - http://www.chicagoreader.com/chicago/EventSearch?narrowByDate=This+Week&eventCategory=93922&keywords=&page=1 , everything is not cleared. For example, META tags on a page, for example
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
remain
<META http-equiv="Content-Type" content="text/html; charset=UTF-8">
instead of having a "</META>" tag or displaying as "<META http-equiv =" Content-Type "content =" text / html; encoding = UTF-8 "/>". I confirm this by outputting the resulting JTidy org.w3c.dom.Document as String.
What can I do to make JTidy really clear the page, i.e. made her well formed? I understand that there are other tools, but this question is specifically related to using JTIdy.
source share