Why is this Rcpp code slower than the byte compiled by R?

Question

Why is this Rcpp code slower than the byte compiled by R?

As the title of the question says, I would like to know why the byte compiled R code (using compiler::cmpfun ) is faster than the equivalent Rcpp code for the following math function:

 func1 <- function(alpha, tau, rho, phi) { abs((alpha + 1)^(tau) * phi - rho * (1- (1 + alpha)^(tau))/(1 - (1 + alpha))) }

Since this is a simple numerical operation, I would expect Rcpp ( funcCpp and funcCpp2 ) to be much faster than the byte compiled by R ( func1c and func2c ), especially since R will have more storage overhead (1+alpha)**tau or its reprogramming. In fact, calculating this indicator two times seems faster than the memory allocation in R ( func1c vs func2c ), which seems especially contradictory, since n is large. My other suggestion is that perhaps compiler::cmpfun distracting magic, but I would like to know if this is true.

So, two things I would like to know:

Why is funcCpp and funcCpp2 slower than func1c and func2c? (Rcpp is slower than compiled R functions)
Why is funcCpp slower than func2? (Rcpp code is slower than pure R)

FWIW, here is my data version C ++ and R

 user% g++ --version Configured with: --prefix=/Library/Developer/CommandLineTools/usr --with-gxx-include-dir=/usr/include/c++/4.2.1 Apple LLVM version 7.0.0 (clang-700.0.72) Target: x86_64-apple-darwin14.3.0 Thread model: posix user% R --version R version 3.2.2 (2015-08-14) -- "Fire Safety" Copyright (C) 2015 The R Foundation for Statistical Computing Platform: x86_64-apple-darwin14.5.0 (64-bit)

And here is the R and Rcpp code:

 library(Rcpp) library(rbenchmark) func1 <- function(alpha, tau, rho, phi) { abs((1 + alpha)^(tau) * phi - rho * (1- (1 + alpha)^(tau))/(1 - (1 + alpha))) } func2 <- function(alpha, tau, rho, phi) { pval <- (alpha + 1)^(tau) abs( pval * phi - rho * (1- pval)/(1 - (1 + alpha))) } func1c <- compiler::cmpfun(func1) func2c <- compiler::cmpfun(func2) func3c <- Rcpp::cppFunction(' double funcCpp(double alpha, int tau, double rho, double phi) { double pow_val = std::exp(tau * std::log(alpha + 1.0)); double pAg = rho/alpha; return std::abs(pow_val * (phi - pAg) + pAg); }') func4c <- Rcpp::cppFunction(' double funcCpp2(double alpha, int tau, double rho, double phi) { double pow_val = pow(alpha + 1.0, tau) ; double pAg = rho/alpha; return std::abs(pow_val * (phi - pAg) + pAg); }') res <- benchmark( func1(0.01, 200, 100, 1000000), func1c(0.01, 200, 100, 1000000), func2(0.01, 200, 100, 1000000), func2c(0.01, 200, 100, 1000000), func3c(0.01, 200, 100, 1000000), func4c(0.01, 200, 100, 1000000), funcCpp(0.01, 200, 100, 1000000), funcCpp2(0.01, 200, 100, 1000000), replications = 100000, order='relative', columns=c("test", "replications", "elapsed", "relative"))

And here is the output of rbenchmark :

  test replications elapsed relative func1c(0.01, 200, 100, 1e+06) 100000 0.349 1.000 func2c(0.01, 200, 100, 1e+06) 100000 0.372 1.066 funcCpp2(0.01, 200, 100, 1e+06) 100000 0.483 1.384 func4c(0.01, 200, 100, 1e+06) 100000 0.509 1.458 func2(0.01, 200, 100, 1e+06) 100000 0.510 1.461 funcCpp(0.01, 200, 100, 1e+06) 100000 0.524 1.501 func3c(0.01, 200, 100, 1e+06) 100000 0.546 1.564 func1(0.01, 200, 100, 1e+06) 100000 0.549 1.573K

+5

c ++ performance numerical r rcpp

lostinarandomforest Oct 15 '15 at 10:09

source share

1 answer

Dirk eddelbuettel · Accepted Answer · 2015-10-16T14:23:02+0000

This is, in fact, an incorrect question. When you install

 func1 <- function(alpha, tau, rho, phi) { abs((alpha + 1)^(tau) * phi - rho * (1- (1 + alpha)^(tau))/(1 - (1 + alpha))) }

without even specifying what the arguments (for example, scalar? vector? large? small) memory overhead are, then at best you can just get a small set of (basic, efficient) function calls directly from the expression being analyzed.

And since we had a byte compiler that was improved by Luke Tierney in subsequent R releases, we know that it performs algebraic expressions well.

Now the compiled C / C ++ code also works well, but there will be overhead for calling compiled coed, and what you see here is that for "quite ritual" problems, the overhead does not really depreciate.

So, in the end you get quite a lot of draws. No wonder how much I can judge.

Why is this Rcpp code slower than the byte compiled by R?

More articles: