Quick fetch from Truncated Normal Distribution using Rcpp and openMP

Question

Quick fetch from Truncated Normal Distribution using Rcpp and openMP

UPDATE:

I tried to implement the Dirk suggestions. Comments? I'm busy right now at JSM, but I would like to get some feedback before knitting Rmd for the gallery. I switched from Armadillo to regular Rcpp, as it added no value. The scalar versions with R :: are pretty nice. I have to add the parameter n for the number of draws if the average / sd is entered as a scalar, and not as vectors of the desired output length.

There are many MCMC applications that require drawing samples from truncated normal distributions. I built on an existing TN implementation and added parallel computing to it.

Questions:

Anyone see further potential speed improvements? ~~In the latter case, the rtruncnorm test is sometimes faster.~~ Rcpp implementation is always faster than existing packages, but can it be improved even further?
I ran it in a complex model that I cannot separate, and my R session crashed. However, I cannot systematically reproduce it, so it could be another part of the code. If someone works with TN, please check it out and let me know. Update: I had no problems with the updated code, but let me know.

How I put things together: As far as I know, the fastest implementation does not apply to CRAN, but the source code can be downloaded stat OSU . Competing implementations in msm and truncorm were slower in my tests. The trick is to effectively regulate the distribution of sentences, where it works exponentially for truncated Normal tails. So I took Chris's code, "Rcpp'ed", and added some openMP grease to it. A dynamic schedule is optimal here, since sampling may take more or less time depending on the boundaries. One thing I found nasty: a lot of statistical distributions are based on the NumericVector type when I wanted to work with doubling. I just encoded this method.

Here is the Rcpp code:

#include <Rcpp.h> #include <omp.h> // norm_rs(a, b) // generates a sample from a N(0,1) RV restricted to be in the interval // (a,b) via rejection sampling. // ====================================================================== // [[Rcpp::export]] double norm_rs(double a, double b) { double x; x = Rf_rnorm(0.0, 1.0); while( (x < a) || (x > b) ) x = norm_rand(); return x; } // half_norm_rs(a, b) // generates a sample from a N(0,1) RV restricted to the interval // (a,b) (with a > 0) using half normal rejection sampling. // ====================================================================== // [[Rcpp::export]] double half_norm_rs(double a, double b) { double x; x = fabs(norm_rand()); while( (x<a) || (x>b) ) x = fabs(norm_rand()); return x; } // unif_rs(a, b) // generates a sample from a N(0,1) RV restricted to the interval // (a,b) using uniform rejection sampling. // ====================================================================== // [[Rcpp::export]] double unif_rs(double a, double b) { double xstar, logphixstar, x, logu; // Find the argmax (b is always >= 0) // This works because we want to sample from N(0,1) if(a <= 0.0) xstar = 0.0; else xstar = a; logphixstar = R::dnorm(xstar, 0.0, 1.0, 1.0); x = R::runif(a, b); logu = log(R::runif(0.0, 1.0)); while( logu > (R::dnorm(x, 0.0, 1.0,1.0) - logphixstar)) { x = R::runif(a, b); logu = log(R::runif(0.0, 1.0)); } return x; } // exp_rs(a, b) // generates a sample from a N(0,1) RV restricted to the interval // (a,b) using exponential rejection sampling. // ====================================================================== // [[Rcpp::export]] double exp_rs(double a, double b) { double z, u, rate; // Rprintf("in exp_rs"); rate = 1/a; //1/a // Generate a proposal on (0, ba) z = R::rexp(rate); while(z > (ba)) z = R::rexp(rate); u = R::runif(0.0, 1.0); while( log(u) > (-0.5*z*z)) { z = R::rexp(rate); while(z > (ba)) z = R::rexp(rate); u = R::runif(0.0,1.0); } return(z+a); } // rnorm_trunc( mu, sigma, lower, upper) // // generates one random normal RVs with mean 'mu' and standard // deviation 'sigma', truncated to the interval (lower,upper), where // lower can be -Inf and upper can be Inf. //====================================================================== // [[Rcpp::export]] double rnorm_trunc (double mu, double sigma, double lower, double upper) { int change; double a, b; double logt1 = log(0.150), logt2 = log(2.18), t3 = 0.725; double z, tmp, lograt; change = 0; a = (lower - mu)/sigma; b = (upper - mu)/sigma; // First scenario if( (a == R_NegInf) || (b == R_PosInf)) { if(a == R_NegInf) { change = 1; a = -b; b = R_PosInf; } // The two possibilities for this scenario if(a <= 0.45) z = norm_rs(a, b); else z = exp_rs(a, b); if(change) z = -z; } // Second scenario else if((a * b) <= 0.0) { // The two possibilities for this scenario if((R::dnorm(a, 0.0, 1.0,1.0) <= logt1) || (R::dnorm(b, 0.0, 1.0, 1.0) <= logt1)) { z = norm_rs(a, b); } else z = unif_rs(a,b); } // Third scenario else { if(b < 0) { tmp = b; b = -a; a = -tmp; change = 1; } lograt = R::dnorm(a, 0.0, 1.0, 1.0) - R::dnorm(b, 0.0, 1.0, 1.0); if(lograt <= logt2) z = unif_rs(a,b); else if((lograt > logt1) && (a < t3)) z = half_norm_rs(a,b); else z = exp_rs(a,b); if(change) z = -z; } double output; output = sigma*z + mu; return (output); } // rtnm( mu, sigma, lower, upper, cores) // // generates one random normal RVs with mean 'mu' and standard // deviation 'sigma', truncated to the interval (lower,upper), where // lower can be -Inf and upper can be Inf. // mu, sigma, lower, upper are vectors, and vectorized calls of this function // speed up computation // cores is an intege, representing the number of cores to be used in parallel //====================================================================== // [[Rcpp::export]] Rcpp::NumericVector rtnm(Rcpp::NumericVector mus, Rcpp::NumericVector sigmas, Rcpp::NumericVector lower, Rcpp::NumericVector upper, int cores){ omp_set_num_threads(cores); int nobs = mus.size(); Rcpp::NumericVector out(nobs); double logt1 = log(0.150), logt2 = log(2.18), t3 = 0.725; double a,b, z, tmp, lograt; int change; #pragma omp parallel for schedule(dynamic) for(int i=0;i<nobs;i++) { a = (lower(i) - mus(i))/sigmas(i); b = (upper(i) - mus(i))/sigmas(i); change=0; // First scenario if( (a == R_NegInf) || (b == R_PosInf)) { if(a == R_NegInf) { change = 1; a = -b; b = R_PosInf; } // The two possibilities for this scenario if(a <= 0.45) z = norm_rs(a, b); else z = exp_rs(a, b); if(change) z = -z; } // Second scenario else if((a * b) <= 0.0) { // The two possibilities for this scenario if((R::dnorm(a, 0.0, 1.0,1.0) <= logt1) || (R::dnorm(b, 0.0, 1.0, 1.0) <= logt1)) { z = norm_rs(a, b); } else z = unif_rs(a,b); } // Third scenario else { if(b < 0) { tmp = b; b = -a; a = -tmp; change = 1; } lograt = R::dnorm(a, 0.0, 1.0, 1.0) - R::dnorm(b, 0.0, 1.0, 1.0); if(lograt <= logt2) z = unif_rs(a,b); else if((lograt > logt1) && (a < t3)) z = half_norm_rs(a,b); else z = exp_rs(a,b); if(change) z = -z; } out(i)=sigmas(i)*z + mus(i); } return(out); }

And this mark:

 libs=c("truncnorm","msm","inline","Rcpp","RcppArmadillo","rbenchmark") if( sum(!(libs %in% .packages(all.available = TRUE)))>0){ install.packages(libs[!(libs %in% .packages(all.available = TRUE))])} for(i in 1:length(libs)) {library(libs[i],character.only = TRUE,quietly=TRUE)} #needed for openMP parallel Sys.setenv("PKG_CXXFLAGS"="-fopenmp") Sys.setenv("PKG_LIBS"="-fopenmp") #no of cores for openMP version cores = 4 #surce code from same dir Rcpp::sourceCpp('truncnorm.cpp') #sample size nn=1000000 bb= 100 aa=-100 benchmark( rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),cores), rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),1),rtnorm(nn,rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn)),rtruncnorm(nn, a=aa, b=100, mean = 0, sd = 1) , order="relative", replications=3 )[,1:4] aa=0 benchmark( rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),cores), rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),1),rtnorm(nn,rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn)),rtruncnorm(nn, a=aa, b=100, mean = 0, sd = 1) , order="relative", replications=3 )[,1:4] aa=2 benchmark( rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),cores), rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),1),rtnorm(nn,rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn)),rtruncnorm(nn, a=aa, b=100, mean = 0, sd = 1) , order="relative", replications=3 )[,1:4] aa=50 benchmark( rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),cores), rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),1),rtnorm(nn,rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn)),rtruncnorm(nn, a=aa, b=100, mean = 0, sd = 1) , order="relative", replications=3 )[,1:4]

Several benchmarks are needed as speed depends on the upper / lower bounds. In different cases, different parts of the algorithm are inserted.

+6

r openmp rcpp

Inferrator Jul 29 '13 at 2:33

source share

1 answer

Dirk eddelbuettel · Answer 1 · 2013-07-29T02:59:30+0000

Really quick comments:

if you enabled RcppArmadillo.h , you do not need to include Rcpp.h - in fact, you should not and we even check that
rep(oneDraw, n) makes n calls. I would write a function that will be called once, which will return n draws to you - it will be faster when you save your office utility calls n-1
Your comment on the set of statistical distributions is based on the NumericVector type, when I wanted to work with doubling, it could reveal some misunderstandings: NumericVector is our convenient proxy class for internal R types: without copies, you can use std::vector<double> or any the form you prefer.
I know little about truncated normals, so I can not comment on the features of your algorithms.
Once you have developed it, consider posting to Rcpp Gallery .

Quick fetch from Truncated Normal Distribution using Rcpp and openMP

More articles: