Alternative to Jsoup.parse () method

I am using Jsoup.parse() to parse this data . Everything works well, but takes a lot of time.

For example, this data takes 20 seconds. for parsing. Are there other solutions for my needs?

code:

 rezult = Jsoup.parse(res.parse().outerHtml(), "UTF-8").text(); 

Where res text from link .

============ UPDATE ==============

I separate this variable from Jsoup.parse() and realized that it is the source of the problem. It takes 20 seconds, not Jsoup.parse() .

 String tmp = res.parse().outerHtml(); 

And it only takes 1 second.

 rezult = Jsoup.parse(tmp, "UTF-8").text(); 

I use this code to get data from this link. I use Jsoup.parse() because without it I got something like this:

 <html> <head></head> <body> {&quot;success&quot;:true,&quot;currentUser&quot;:43743,&quot;careTypes&quot;:[{&quot;id&quot;:1,&quot;name&quot;:&quot;\u0421\u0442\u0438\u0440\u043a\u0430&quot;,&quot;description&quot;:&quot;\u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0441\u0442\u0438\u0440\u043a\u0438 \u043f\u043e\u044f\u0432\u0438\u0442\u0441\u044f \u0437\u0434\u0435\u0441\u044c, \u043a\u0430\u043a \u0442\u043e\u043b\u044c\u043a\u043e \u0432\u044b \u0432\u044b\u0431\u0435\u0440\u0435\u0442\u0435 \u0440\u0435\u043a\u043e\u043c\u0435\u043d\u0434\u0443\u0435\u043c\u044b\u0439 \u0440\u0435\u0436\u0438\u043c.&quot;},{&quot;id&quot;:2,&quot;name&quot;:&quot;\u041e\u0442\u0431\u0435\u043b\u0438\u0432\u0430\u043d\u0438\u0435&quot;,&quot;description&quot;:&quot;\u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u043e\u0442\u0431\u0435\u043b\u0438\u0432\u0430\u043d\u0438\u044f \u043f\u043e\u044f\u0432\u0438\u0442\u0441\u044f \u0437\u0434\u0435\u0441\u044c, \u043a\u0430\u043a \u0442\u043e\u043b\u044c\u043a\u043e \u0432\u044b \u0432\u044b\u0431\u0435\u0440\u0435\u0442\u0435 

instead of this:

 {"success":true,"currentUser":43743,"careTypes":[{"id":1,"name":"\u0421\u0442\u0438\u0440\u043a\u0430","description":"\u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u0441\u0442\u0438\u0440\u043a\u0438 \u043f\u043e\u044f\u0432\u0438\u0442\u0441\u044f \u0437\u0434\u0435\u0441\u044c, \u043a\u0430\u043a \u0442\u043e\u043b\u044c\u043a\u043e \u0432\u044b \u0432\u044b\u0431\u0435\u0440\u0435\u0442\u0435 \u0440\u0435\u043a\u043e\u043c\u0435\u043d\u0434\u0443\u0435\u043c\u044b\u0439 \u0440\u0435\u0436\u0438\u043c."},{"id":2,"name":"\u041e\u0442\u0431\u0435\u043b\u0438\u0432\u0430\u043d\u0438\u0435","description":"\u041e\u043f\u0438\u0441\u0430\u043d\u0438\u0435 \u043e\u0442\u0431\u0435\u043b\u0438\u0432\u0430\u043d\u0438\u044f \u043f\u043e\u044f\u0432\u0438\u0442\u0441\u044f \u0437\u0434\u0435\u0441\u044c, \u043a\u0430\u043a \u0442\u043e\u043b\u044c\u043a\u043e \u0432\u044b \u0432\u044b\u0431\u0435\u0440\u0435\u0442\u0435 

But now the main problem is to change the res.parse() method to something else with less execution time.

============ UPDATE 2 ==============

  long t2 = System.currentTimeMillis(); try { Connection connection = Jsoup.connect(url) .method(Connection.Method.POST) .cookies(cookies) .timeout(30000) .ignoreContentType(true); if (data != null) { connection.data(data); } res = connection.execute(); Logger.d(System.currentTimeMillis() - t2 + " = connection.execute"); long t6 = System.currentTimeMillis(); String tmp = res.parse().outerHtml(); Logger.d(System.currentTimeMillis() - t6 + " = res.parse().outerHtml()"); long t4 = System.currentTimeMillis(); rezult = Jsoup.parse(tmp, "UTF-8").text(); Logger.d(System.currentTimeMillis() - t4 + " = Jsoup.parse"); 

And what I got in Logcat:

 1588 = connection.execute 16150 = res.parse().outerHtml() 1466 = Jsoup.parse 
+6
source share
2 answers

I found a solution to this problem.

There is another method in Jsoup lib to get page content indiscriminately.

The solution will change this line:

 String tmp = res.parse().outerHtml(); 

in this line:

 String tmp = res.body(); 

This is actually 20 times faster. Maybe they do different jobs, but in my needs it is one and the same.

0
source

Use eval() . Also, make sure that the eval() source is safe. eval() will try to evaluate any operator and, therefore, could potentially reveal security problems, is not used properly

0
source

Source: https://habr.com/ru/post/959404/


All Articles