How to extract paragraph text from html using Jsoup?

import java.io.IOException; import java.util.logging.Level; import java.util.logging.Logger; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class JavaApplication14 { public static void main(String[] args) { try { Document doc = Jsoup.connect("tanmoy_mahathir.makes.org/thimble/146").get(); String html= "<html><head></head>" + "<body><p>Parsed HTML into a doc." + "</p></body></html>"; Elements paragraphs = doc.select("p"); for(Element p : paragraphs) System.out.println(p.text()); } catch (IOException ex) { Logger.getLogger(JavaApplication14.class.getName()).log(Level.SEVERE, null, ex); } } 

}

can someone help me with jsoup code, how can I parse only a part, including a paragraph, just to print

 Hello ,World! Nothing is impossible 
+4
source share
3 answers

For this little html snippet you just need to do

 String html= "<html><head></head>" + "<body><p>Parsed HTML into a doc."+ +"</p></body></html>"; Document doc = Jsoup.parse(html); Elements paragraphs = doc.select("p"); for(Element p : paragraphs) System.out.println(p.text()); 

As I can see, your link contains pretty much the same html that you can also replace the doc definition with

 Document doc = Jsoup.connect("https://tanmoy_mahathir.makes.org/thimble/146").get(); 

UPDATE

Here is the complete code that compiles and works just fine for me.

 import java.io.IOException; import java.util.logging.*; import org.jsoup.*; import org.jsoup.nodes.*; import org.jsoup.select.*; public class JavaApplication14 { public static void main(String[] args) { try { String url = "https://tanmoy_mahathir.makes.org/thimble/146"; Document doc = Jsoup.connect(url).get(); Elements paragraphs = doc.select("p"); for(Element p : paragraphs) System.out.println(p.text()); } catch (IOException ex) { Logger.getLogger(JavaApplication14.class.getName()) .log(Level.SEVERE, null, ex); } } } 
+2
source

You can start with this ...

 String url = "url of the html page"; Document page = Jsoup.parse(url); Elements elements = page.select("div[class=class_name] p"); 
0
source
  • Element firstPara = d.select ("div.post-content p"). first (); 2
  • System.out.println (firstPara);

you can select a tag with its class, and then you can specify how to get the first paragraph

0
source

Source: https://habr.com/ru/post/1486749/


All Articles