Get all html as string from HTMLDocument

Im coding in Java ..

Does anyone know how I can get the contents of javax.swing.text.html.HTMLDocument as strings? This is what I got so far ...

URL url = new URL( "http://www.test.com" ); HTMLEditorKit kit = new HTMLEditorKit(); HTMLDocument doc = (HTMLDocument) kit.createDefaultDocument(); doc.putProperty("IgnoreCharsetDirective", Boolean.TRUE); Reader HTMLReader = new InputStreamReader(url.openConnection().getInputStream()); kit.read(HTMLReader, doc, 0); 

I need the contents of an HTMLDocument as a String.

Example:

 <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01//EN" "http://www.w3.org/TR/html4/strict.dtd"> <html><head><meta http-equiv="X-UA-Compatible" content="IE=Edge,chrome=1"> 

....... etc.

Any help would be greatly appreciated. I need to use the HTMLDocument class to handle html correctly :)

Thanks Daniel

+6
source share
2 answers
 StringWriter writer = new StringWriter(); kit.write(writer, doc, 0, doc.getLength()); String s = writer.toString(); 
+13
source

You don't need an editor or reader at all - just read the input stream. For example, with commons-io IOUtils.toString(inputStream)

or you can use:

 Content content = document.getContent(); String str = content.getString(0, content.length() - 1); 
+1
source

Source: https://habr.com/ru/post/915024/


All Articles