How to load local html file in Jsoup?

I cannot load in the local html file using the Jsoup library. Or at least it doesn't seem to know. I hard-coded the exact html in the local file (like var 'html'), and when I switch to this instead of entering the file, the code works fine. But the file is read in both cases.

import java.io.File; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class FileHtmlParser{ public String input; //constructor public FileHtmlParser(String inputFile){input = inputFile;} //methods public FileHtmlParser execute(){ File file = new File(input); System.out.println("The file can be read: " + file.canRead()); String html = "<html><head><title>First parse</title><meta>106</meta> <meta http-equiv=\"Content-Type\" content=\"text/html; charset=utf-8\" /></head>" + "<body><p>Parsed HTML into a doc.</p>" + "" + "<div id=\"navbar\">this is the div</div></body></html>"; Document doc = Jsoup.parseBodyFragment(input); Elements content = doc.getElementsByTag("div"); if(content.hasText()){System.out.println("result is " + content.outerHtml());} else System.out.println("nothing!"); return this; } }/*endOfClass*/ 

The result when:
Document doc = Jsoup.parseBodyFragment (html)

 The file can be read: true result is <div id="navbar"> this is the div </div> 

The result when:
Document doc = Jsoup.parseBodyFragment (input)

 The file can be read: true nothing! 
+6
source share
1 answer

Your mistake is that Jsoup.parseBodyFragment() knows if you are passing this file name containing html markup or a string containing html markup.

Jsoup.parseBodyFragment(input) expects input be a String that contains html markup, not the file name.

To ask it to JSoup.parse(File in, String charsetName) from a file, use the JSoup.parse(File in, String charsetName) method instead:

 File in = new File(input); Document doc = JSoup.parse(in, null); 
+9
source

Source: https://habr.com/ru/post/910218/


All Articles