Is there something wrong with my system.time wrapper function?

EDIT: Updated thanks to @daroczig's excellent answer below. However, test 2 still seems to take longer than test 1, which I'm interested in.

UPDATE: in the second reading, @daroczig's answer explains my confusion - the problem was due to the fact that I did not correctly conceive the system.time (expr) line of code.

I wanted to create a version of the system.time function that would be a little more informative for me in terms of understanding runtime timing fluctuations:

 system.time.summary <- function(N, expr) { t.mat <- replicate(N, system.time(expr)) as.data.frame(apply(t.mat[1:3,], 1, summary)) } 

However, the problem is that in the test.2 code test.2 it seems like it takes longer than test.1 (and I ran them several times to check), although the code is pretty much the same ( test.1 uses a wrapper function whereas test.2 is only the source code)

 # set up number of runs N <- 100 # test 1 system.time.summary(N, (1:1e8)^2 + 1) user.self sys.self elapsed Min. 0.000 0.000 0.000 1st Qu. 0.000 0.000 0.000 Median 0.000 0.000 0.000 Mean 0.058 0.031 0.089 3rd Qu. 0.000 0.000 0.000 Max. 0.580 0.310 0.890 # test 2 t.mat = replicate(N, system.time((1:1e8)^2 + 1)) as.data.frame(apply(t.mat[1:3,], 1, summary)) user.self sys.self elapsed Min. 0.630 0.120 0.860 1st Qu. 0.665 0.170 0.880 Median 0.695 0.195 0.880 Mean 0.692 0.196 0.882 3rd Qu. 0.715 0.225 0.890 Max. 0.760 0.260 0.900 

I hope I explained that OK! Maybe it is Monday to Monday, but it confuses me ...

My system:

 # Windows Server 2008 R2 > sessionInfo() R version 2.12.0 (2010-10-15) Platform: x86_64-pc-mingw32/x64 (64-bit) 
+4
source share
2 answers

As daroczig said, you have an additional system.time. But there is something else:

If you put browser() in your function, you will see what happens. In fact, the expression you make is evaluated only once and then stored in memory. This is how R optimizes internally. So if you do this:

 system.time.summary(N,(1:1e8)^2 +1) 

t.mat internally:

  [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8] [,9] [,10] user.self 0.61 0 0 0 0 0 0 0 0 0 sys.self 0.36 0 0 0 0 0 0 0 0 0 elapsed 0.97 0 0 0 0 0 0 0 0 0 user.child NA NA NA NA NA NA NA NA NA NA sys.child NA NA NA NA NA NA NA NA NA NA 

and expr:

 Browse[2]> str(expr) num [1:100000000] 2 5 10 17 26 37 50 65 82 101 ... 

This is a bit difficult to change, since R will evaluate any static expression only once, and then retrieve the result another 99 times from memory. If you want this not to happen, you must explicitly pass the expression and add the eval() function.

 system.time.summary <- function(N, expr) { t.mat <- replicate(N, system.time(eval(expr))) as.data.frame(apply(t.mat[1:3,], 1, summary)) } system.time.summary(N, expression((1:1e8)^2 + 1)) 

Now expr gets evaluated every time and remains an expression in the function:

 Browse[2]> expr expression((1:1e+08)^2 + 1) 

This gives the correct timings.

  user.self sys.self elapsed Min. 0.6400 0.2000 0.970 1st Qu. 0.6850 0.2375 0.980 Median 0.7150 0.2700 0.985 Mean 0.7130 0.2700 0.985 3rd Qu. 0.7425 0.2975 0.990 Max. 0.7800 0.3500 1.000 
+2
source

You execute system.time(system.time()) in the first test, as well as system.time(1:1e8)^2 + 1) as an expression in a function that is not a good idea, see:

 > expr <- system.time((1:1e8)^2 + 1) > system.time(expr) user system elapsed 0 0 0 

But in any case: use the microbenchmark package from CRAN for such purposes, you will not regret it. Customize your functions and you can easily deploy your simulations using 100, 1000 or any runs. You can get a neat resume and boxes at the end of the benchmarking.

For instance:

 > test1 <- function() (1:1e8)^2 + 1 > (results <- microbenchmark(test1(), times=10)) Unit: nanoeconds min lq median uq max test1() 3565386356 3703142531 3856450582 3931033077 3986309085 > boxplot(results) 

enter image description here

+7
source

Source: https://habr.com/ru/post/1341647/


All Articles