How to get table from html page using JAVA

I am working on a project where I try to get financial reports from the Internet and use them in the JAVA application to automatically create ratios and charts.

The site I use uses a username and password to access tables.
The tag is TBODY, but there are 2 more TBODYs in the html.

How can I use java to print my table in a txt file, where can I use it in my application? What would be the best way to do this, and what should I read?

+6
source share
1 answer

If this were my project, I would consider using an HTML parser, something like jsoup (although others are available). There is a tutorial on the jsoup website, and after playing with it for a while, you will most likely find it quite easy to use.

For example, for an HTML table, for example:

enter image description here

jsoup can parse it like this:

import java.io.IOException; import org.jsoup.Jsoup; import org.jsoup.nodes.Document; import org.jsoup.nodes.Element; import org.jsoup.select.Elements; public class TableEg { public static void main(String[] args) { String html = "http://publib.boulder.ibm.com/infocenter/iadthelp/v7r1/topic/" + "com.ibm.etools.iseries.toolbox.doc/htmtblex.htm"; try { Document doc = Jsoup.connect(html).get(); Elements tableElements = doc.select("table"); Elements tableHeaderEles = tableElements.select("thead tr th"); System.out.println("headers"); for (int i = 0; i < tableHeaderEles.size(); i++) { System.out.println(tableHeaderEles.get(i).text()); } System.out.println(); Elements tableRowElements = tableElements.select(":not(thead) tr"); for (int i = 0; i < tableRowElements.size(); i++) { Element row = tableRowElements.get(i); System.out.println("row"); Elements rowItems = row.select("td"); for (int j = 0; j < rowItems.size(); j++) { System.out.println(rowItems.get(j).text()); } System.out.println(); } } catch (IOException e) { e.printStackTrace(); } } } 

The result of the following output:

 headers ACCOUNT NAME BALANCE row 0000001 Customer1 100.00 row 0000002 Customer2 200.00 row 0000003 Customer3 550.00 
+20
source

Source: https://habr.com/ru/post/916770/


All Articles