How to estimate the remaining boot time (accurately)?

Of course, you can divide the remaining file size by the current download speed, but if your download speed fluctuates (and will), this will not lead to a very good result. What is the best algorithm for creating smoother countdowns?

+55
algorithm download estimation
May 6 '10 at 8:30
source share
6 answers

I wrote an algorithm a few years ago to predict the time remaining in the imaging and multicast program that used a moving average with a reset when the current bandwidth went beyond a predetermined range. It will remain smooth if nothing sharp happens, then quickly adapts and returns to the moving average again. See an example chart here:

enter image description here

The thick blue line in this example diagram is the actual throughput over time. Pay attention to the low bandwidth during the first half of the transmission, and then it increases sharply in the second half. The orange line is the overall average. Note that it never adjusts far enough to give an accurate forecast of how long it will take to complete. The gray line is the moving average (i.e., the average of the last N data points - in this graph, N is 5, but in fact N may need to be increased for sufficient smoothing). It recovers faster, but still takes time to configure. This will take longer than more N. So if your data is quite noisy, then N should be longer and the recovery time will be longer.

The green line is the algorithm I used. This happens just like a moving average, but when the data goes beyond a predetermined range (indicated by light thin blue and yellow lines), it resets the moving average and immediately rises. The predefined range can also be based on standard deviation, so it can adjust the noise level of the data automatically. I just threw these values ​​in Excel to build a diagram for this answer so that it is not perfect, but you get the idea.

Data can be invented so that this algorithm is not a good predictor of the remaining time. The bottom line is that you need to have a general idea of ​​how you expect the data to behave, and choose an algorithm accordingly. My algorithm worked well for the datasets I saw, so we continued to use it.

Another important tip: developers usually ignore setup and completion times in their progress indicators and time estimation calculations. This leads to a perpetual 99% or 100% progress bar that just sits there for a long time (while clearing caches or performing other cleaning work), or to preliminary estimates when directories are scanned or other configuration work is accumulated time. but not getting any percentage of progress that throws everything away. You can run several tests, which include installation and disassembly times, and estimate how much time is on average this time or based on the size of the job, and add this time to the progress bar. For example, the first 5% of the work is setup work, and the last 10% is disassembly work, and then 85% in the middle is the download or the repeating tracking process. That might help too.

+7
Jun 15 '17 at 22:08
source share

exponential moving average is great for this. This makes it possible to smooth the average value, so that every time you add a new sample, older samples become more and more important for the overall average. They are still considered, but their value decreases exponentially - hence the name. And since this is a β€œmoving” average, you only need to save one number.

In the context of measuring download speed, the formula will look like this:

averageSpeed = SMOOTHING_FACTOR * lastSpeed + (1-SMOOTHING_FACTOR) * averageSpeed; 

SMOOTHING_FACTOR is a number from 0 to 1. The higher this number, the faster the old samples are discarded. As you can see in the formula, when SMOOTHING_FACTOR is 1, you simply use the value of your last observation. When SMOOTHING_FACTOR is 0, averageSpeed never changes. Thus, you want something between them and, as a rule, a low value in order to get decent smoothing. I found that 0.005 provides a pretty good anti-aliasing value for average download speed.

lastSpeed - the last measured download speed. You can get this value by running a timer every second or so to calculate how many bytes have been downloaded since the last time it started.

averageSpeed is obviously the number you want to use to calculate your estimated time. Initialize this before the first lastSpeed dimension you get.

+111
Oct. 01 2018-10-10
source share
 speed=speedNow*0.5+speedLastHalfMinute*0.3+speedLastMinute*0.2 
+7
May 6 '10 at 8:38 a.m.
source share

I think the best thing you can do is divide the remaining file size by the average download speed (downloaded so far divided by how long you downloaded). This will change a little, but will be more stable the longer you download.

+5
May 6 '10 at 8:36 a.m.
source share

In addition to Ben Dolman's answer, you can also calculate the oscillations inside the algorithm. It will be smoother, but it will also predict the speed of attack.

Something like that:

 prediction = 50; depencySpeed = 200; stableFactor = .5; smoothFactor = median(0, abs(lastSpeed - averageSpeed), depencySpeed); smoothFactor /= (depencySpeed - prediction * (smoothFactor / depencySpeed)); smoothFactor = smoothFactor * (1 - stableFactor) + stableFactor; averageSpeed = smoothFactor * lastSpeed + (1 - smoothFactor) * averageSpeed; 

Whether fluctuation or not, it will be both stable and different, with the correct values ​​for prediction and depencySpeed; you need to play a little with it depending on the speed of your internet. These settings are ideal for a transfer rate of 600 kB / s, while it ranges from 0 to 1 MB.

+2
Mar 10 '13 at 21:00
source share

I found Ben Dolman's answer very useful, but for someone like me, who is not so mathematically inclined, it took me about an hour to fully implement this in my code. Here is an easier way to say the same thing in python, if there are any inaccuracies, let me know, but in my testing this works very well:

 def exponential_moving_average(data, samples=0, smoothing=0.02): ''' data: an array of all values. samples: how many previous data samples are avraged. Set to 0 to average all data points. smoothing: a value between 0-1, 1 being a linear average (no falloff). ''' if len(data) == 1: return data[0] if samples == 0 or samples > len(data): samples = len(data) average = sum(data[-samples:]) / samples last_speed = data[-1] return (smoothing * last_speed) + ((1 - smoothing) * average) input_data = [4.5, 8.21, 8.7, 5.8, 3.8, 2.7, 2.5, 7.1, 9.3, 2.1, 3.1, 9.7, 5.1, 6.1, 9.1, 5.0, 1.6, 6.7, 5.5, 3.2] # this would be a constant stream of download speeds as you go, pre-defined here for illustration data = [] ema_data = [] for sample in input_data: data.append(sample) average_value = exponential_moving_average(data) ema_data.append(average_value) # print it out for visualization for i in range(len(data)): print("REAL: ", data[i]) print("EMA: ", ema_data[i]) print("--") 
0
Jan 19 '19 at 6:14
source share



All Articles