It is often said that it is better lapply
over for
loops. There are some exceptions, as, for example, Hadley Wickham points out in his book Advance R.
( http://adv-r.had.co.nz/Functionals.html ) (change in place, recursion, etc.). The following is one of these cases.
Just for the sake of learning, I tried to rewrite the perceptron algorithm in functional form to compare relative performance. source ( https://rpubs.com/FaiHas/197581 ).
Here is the code.
# prepare input data(iris) irissubdf <- iris[1:100, c(1, 3, 5)] names(irissubdf) <- c("sepal", "petal", "species") head(irissubdf) irissubdf$y <- 1 irissubdf[irissubdf[, 3] == "setosa", 4] <- -1 x <- irissubdf[, c(1, 2)] y <- irissubdf[, 4]
I did not expect any consistent improvement due to the above issues. Nevertheless, I was very surprised when I saw a sharp aggravation using lapply
and replicate
.
I got these results using the microbenchmark
function from the microbenchmark
library
What could be the reason? Could this be a memory leak?
expr min lq mean median uq f() 48670.878 50600.7200 52767.6871 51746.2530 53541.2440 perceptron(as.matrix(irissubdf[1:2]), irissubdf$y, 1, 10) 4184.131 4437.2990 4686.7506 4532.6655 4751.4795 perceptronC(as.matrix(irissubdf[1:2]), irissubdf$y, 1, 10) 95.793 104.2045 123.7735 116.6065 140.5545 max neval 109715.673 100 6513.684 100 264.858 100
The first function is the lapply
/ replicate
function
The second is a function with for
loops
The third is the same function in C++
using Rcpp
Here, according to Roland, function profiling. I'm not sure I can interpret it correctly. It looks like I spent most of my time profiling functions in a subset