How to extract paragraph text from html using Jsoup?

Question

How to extract paragraph text from html using Jsoup?

import java.io.IOException; import java.util.logging.Level; import java.util.logging.Logger; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class JavaApplication14 { public static void main(String[] args) { try { Document doc = Jsoup.connect("tanmoy_mahathir.makes.org/thimble/146").get(); String html= "<html><head></head>" + "<body><p>Parsed HTML into a doc." + "</p></body></html>"; Elements paragraphs = doc.select("p"); for(Element p : paragraphs) System.out.println(p.text()); } catch (IOException ex) { Logger.getLogger(JavaApplication14.class.getName()).log(Level.SEVERE, null, ex); } }

}

can someone help me with jsoup code, how can I parse only a part, including a paragraph, just to print

 Hello ,World! Nothing is impossible

+4

jsoup

Tanmoy Mahathir Jun 18 '13 at 5:35

source share

3 answers

You can start with this ...

 String url = "url of the html page"; Document page = Jsoup.parse(url); Elements elements = page.select("div[class=class_name] p");

0

obsolete Jun 18 '13 at 6:20

source share

Element firstPara = d.select ("div.post-content p"). first (); 2
System.out.println (firstPara);

you can select a tag with its class, and then you can specify how to get the first paragraph

0

Nomanjaved Sep 03 '14 at 3:15

source share

selig · Accepted Answer · 2013-06-18T06:51:25+0000

For this little html snippet you just need to do

 String html= "<html><head></head>" + "<body><p>Parsed HTML into a doc."+ +"</p></body></html>"; Document doc = Jsoup.parse(html); Elements paragraphs = doc.select("p"); for(Element p : paragraphs) System.out.println(p.text());

As I can see, your link contains pretty much the same html that you can also replace the doc definition with

 Document doc = Jsoup.connect("https://tanmoy_mahathir.makes.org/thimble/146").get();

UPDATE

Here is the complete code that compiles and works just fine for me.

 import java.io.IOException; import java.util.logging.*; import org.jsoup.*; import org.jsoup.nodes.*; import org.jsoup.select.*; public class JavaApplication14 { public static void main(String[] args) { try { String url = "https://tanmoy_mahathir.makes.org/thimble/146"; Document doc = Jsoup.connect(url).get(); Elements paragraphs = doc.select("p"); for(Element p : paragraphs) System.out.println(p.text()); } catch (IOException ex) { Logger.getLogger(JavaApplication14.class.getName()) .log(Level.SEVERE, null, ex); } } }

How to extract paragraph text from html using Jsoup?

More articles: