Using Jsoup connect () in a loop. The first request is always much slower than all the other subsequent ones.

I am creating a small application to determine how long it takes to load an HTML document, checking every x seconds.

I use jsoup in a loop:

Connection.Response response = null; for (int i = 0; i < totalGets; i++) { long startTime = System.currentTimeMillis(); try { response = Jsoup.connect(url) .userAgent(USER_AGENT) //just using a Firefox user-agent .timeout(30_000) .execute(); } catch (IOException e) { if (e.getMessage().contains("connect timed out")) { System.out.println("Request timed out after 30 seconds!"); } } long currentTime = System.currentTimeMillis(); System.out.println("Response time: " + (currentTime - startTime) + "ms" + "\tResponse code: " + response.statusCode()); sleep(2000); } 

The problem I am facing is that the very first jsoup connection execution is always slower than all subsequent times, no matter which website.

Here is my conclusion at https://www.google.com

 Response time: 934ms Response code: 200 Response time: 149ms Response code: 200 Response time: 122ms Response code: 200 Response time: 136ms Response code: 200 Response time: 128ms Response code: 200 

Here is what I get http://stackoverflow.com

 Response time: 440ms Response code: 200 Response time: 182ms Response code: 200 Response time: 187ms Response code: 200 Response time: 193ms Response code: 200 Response time: 185ms Response code: 200 

Why is it always faster after the first connection? Is there a better way to determine the loading speed of a document?

+5
source share
2 answers

1. Jsoup must run some boiler plate code before the first request is executed. I will not count the first query in your dimensions, since all this initialization will distort the first time of the query.

2. As mentioned in the comments, many responses to website caching for a couple of seconds. Depending on the website you want to measure, you can use some tricks to create a web server each time to create a new site. Such a trick might be to add a timestamp parameter. Usually, _ used for this (for example, http: // url / path /? Pameter1 = val1 & _ = ts ). Or you can send the cache headers in an HTTP request. however, none of these tricks can make the web server behave the way you want it to. This way you can wait longer than 30 seconds between each request.

+3
source

I think that besides @luksch points there is one more factor, I think that Java maintains communication in the mode for several seconds, perhaps it saves time when traveling through protocols.

If you use .header("Connection", "close") , you will see a more consistent time.

You can verify that connections are supported using the sniffer. At least I can see the port numbers (I mean the source port, of course) repeatedly.

EDIT:

Another thing that can add the time of the first request is DNS lookup ...

+2
source

Source: https://habr.com/ru/post/1238296/


All Articles