Extract CSS styles from HTML using JSOUP in JAVA

Can anyone help with extracting CSS styles from HTML using Jsoup in Java. For example, in the bottom html I want to extract .ft00 and .ft01

<HTML> <HEAD> <TITLE>Page 1</TITLE> <META http-equiv="Content-Type" content="text/html; charset=UTF-8"> <DIV style="position:relative;width:931;height:1243;"> <STYLE type="text/css"> <!-- .ft00{font-size:11px;font-family:Times;color:#ffffff;} .ft01{font-size:11px;font-family:Times;color:#ffffff;} --> </STYLE> </HEAD> </HTML> 
+4
source share
2 answers

If a style is embedded in your element, you just need to use .attr("style") .

JSoup is not an Html renderer, it's just an HTML parser, so you have to parse the content from the resulting html <style> content <style> . You can use a simple regular expression for this; but it will not work in all cases. You can use the CSS parser for this task.

 public class Test { public static void main(String[] args) throws Exception { String html = "<HTML>\n" + "<HEAD>\n"+ "<TITLE>Page 1</TITLE>\n"+ "<META http-equiv=\"Content-Type\" content=\"text/html; charset=UTF-8\">\n"+ "<DIV style=\"position:relative;width:931;height:1243;\">\n"+ "<STYLE type=\"text/css\">\n"+ "<!--\n"+ " .ft00{font-size:11px;font-family:Times;color:#ffffff;}\n"+ " .ft01{font-size:11px;font-family:Times;color:#ffffff;}\n"+ "-->\n"+ "</STYLE>\n"+ "</HEAD>\n"+ "</HTML>"; Document doc = Jsoup.parse(html); Element style = doc.select("style").first(); Matcher cssMatcher = Pattern.compile("[.](\\w+)\\s*[{]([^}]+)[}]").matcher(style.html()); while (cssMatcher.find()) { System.out.println("Style `" + cssMatcher.group(1) + "`: " + cssMatcher.group(2)); } } } 

It will display:

 Style `ft00`: font-size:11px;font-family:Times;color:#ffffff; Style `ft01`: font-size:11px;font-family:Times;color:#ffffff; 
+4
source

Try the following:

 Document document = Jsoup.parse(html); String style = document.select("style").first().data(); 

You can then use the CSS parser to get the details you are interested in.

+5
source

Source: https://habr.com/ru/post/1443227/


All Articles