I am trying to understand the difference in performance between a function written in RcppArmadillo and one written in a standalone C ++ program using the Armadillo library. For example, consider the following simple function that calculates coefficients for a linear model using a traditional textbook formula.
#include <RcppArmadillo.h>
using namespace Rcpp;
using namespace arma;
void simpleLm(NumericMatrix Xr, NumericMatrix yr) {
int n = Xr.nrow(), k = Xr.ncol();
mat X(Xr.begin(), n, k, false);
colvec y(yr.begin(), yr.nrow(), false);
colvec coef = inv(X.t()*X)*X.t()*y;
}
It takes about 6 seconds to work with the matrix 1000000x100for X. Some code timings (not shown) show that all the time is spent on computing coef.
X <- matrix(rnorm(1000000*100), ncol=100)
y <- matrix(rep(1, 1000000))
system.time(simpleLm(X,y))
user system elapsed
6.028 0.009 6.040
Now consider a very similar function written in C ++, which is then compiled with g++.
#include <iostream>
#include <armadillo>
#include <chrono>
#include <cstdlib>
using namespace std;
using namespace arma;
int main(int argc, char **argv) {
int n = 1000000;
mat X = randu<mat>(n,100);
vec y = ones<vec>(n);
chrono::steady_clock::time_point start = chrono::steady_clock::now();
colvec coef = inv(X.t()*X)*X.t()*y;
chrono::steady_clock::time_point end = chrono::steady_clock::now();
chrono::duration<double, milli> diff = end - start;
cout << diff.count() << endl;
return 0;
}
coef 0,5 1/12- , RcppArmadillo.
Mac OS X 10.9.2 R 3.1.0, Rcpp 0.11.1 RcppArmadillo 0.4.200.0. Rcpp, sourceCpp. ++ Armadillo 4.200.0, Fortran Mac Homebrew (brew install gfortran).