Quick fetch from Truncated Normal Distribution using Rcpp and openMP

UPDATE:

I tried to implement the Dirk suggestions. Comments? I'm busy right now at JSM, but I would like to get some feedback before knitting Rmd for the gallery. I switched from Armadillo to regular Rcpp, as it added no value. The scalar versions with R :: are pretty nice. I have to add the parameter n for the number of draws if the average / sd is entered as a scalar, and not as vectors of the desired output length.


There are many MCMC applications that require drawing samples from truncated normal distributions. I built on an existing TN implementation and added parallel computing to it.

Questions:

  • Anyone see further potential speed improvements? In the latter case, the rtruncnorm test is sometimes faster. Rcpp implementation is always faster than existing packages, but can it be improved even further?
  • I ran it in a complex model that I cannot separate, and my R session crashed. However, I cannot systematically reproduce it, so it could be another part of the code. If someone works with TN, please check it out and let me know. Update: I had no problems with the updated code, but let me know.

How I put things together: As far as I know, the fastest implementation does not apply to CRAN, but the source code can be downloaded stat OSU . Competing implementations in msm and truncorm were slower in my tests. The trick is to effectively regulate the distribution of sentences, where it works exponentially for truncated Normal tails. So I took Chris's code, "Rcpp'ed", and added some openMP grease to it. A dynamic schedule is optimal here, since sampling may take more or less time depending on the boundaries. One thing I found nasty: a lot of statistical distributions are based on the NumericVector type when I wanted to work with doubling. I just encoded this method.

Here is the Rcpp code:

#include <Rcpp.h> #include <omp.h> // norm_rs(a, b) // generates a sample from a N(0,1) RV restricted to be in the interval // (a,b) via rejection sampling. // ====================================================================== // [[Rcpp::export]] double norm_rs(double a, double b) { double x; x = Rf_rnorm(0.0, 1.0); while( (x < a) || (x > b) ) x = norm_rand(); return x; } // half_norm_rs(a, b) // generates a sample from a N(0,1) RV restricted to the interval // (a,b) (with a > 0) using half normal rejection sampling. // ====================================================================== // [[Rcpp::export]] double half_norm_rs(double a, double b) { double x; x = fabs(norm_rand()); while( (x<a) || (x>b) ) x = fabs(norm_rand()); return x; } // unif_rs(a, b) // generates a sample from a N(0,1) RV restricted to the interval // (a,b) using uniform rejection sampling. // ====================================================================== // [[Rcpp::export]] double unif_rs(double a, double b) { double xstar, logphixstar, x, logu; // Find the argmax (b is always >= 0) // This works because we want to sample from N(0,1) if(a <= 0.0) xstar = 0.0; else xstar = a; logphixstar = R::dnorm(xstar, 0.0, 1.0, 1.0); x = R::runif(a, b); logu = log(R::runif(0.0, 1.0)); while( logu > (R::dnorm(x, 0.0, 1.0,1.0) - logphixstar)) { x = R::runif(a, b); logu = log(R::runif(0.0, 1.0)); } return x; } // exp_rs(a, b) // generates a sample from a N(0,1) RV restricted to the interval // (a,b) using exponential rejection sampling. // ====================================================================== // [[Rcpp::export]] double exp_rs(double a, double b) { double z, u, rate; // Rprintf("in exp_rs"); rate = 1/a; //1/a // Generate a proposal on (0, ba) z = R::rexp(rate); while(z > (ba)) z = R::rexp(rate); u = R::runif(0.0, 1.0); while( log(u) > (-0.5*z*z)) { z = R::rexp(rate); while(z > (ba)) z = R::rexp(rate); u = R::runif(0.0,1.0); } return(z+a); } // rnorm_trunc( mu, sigma, lower, upper) // // generates one random normal RVs with mean 'mu' and standard // deviation 'sigma', truncated to the interval (lower,upper), where // lower can be -Inf and upper can be Inf. //====================================================================== // [[Rcpp::export]] double rnorm_trunc (double mu, double sigma, double lower, double upper) { int change; double a, b; double logt1 = log(0.150), logt2 = log(2.18), t3 = 0.725; double z, tmp, lograt; change = 0; a = (lower - mu)/sigma; b = (upper - mu)/sigma; // First scenario if( (a == R_NegInf) || (b == R_PosInf)) { if(a == R_NegInf) { change = 1; a = -b; b = R_PosInf; } // The two possibilities for this scenario if(a <= 0.45) z = norm_rs(a, b); else z = exp_rs(a, b); if(change) z = -z; } // Second scenario else if((a * b) <= 0.0) { // The two possibilities for this scenario if((R::dnorm(a, 0.0, 1.0,1.0) <= logt1) || (R::dnorm(b, 0.0, 1.0, 1.0) <= logt1)) { z = norm_rs(a, b); } else z = unif_rs(a,b); } // Third scenario else { if(b < 0) { tmp = b; b = -a; a = -tmp; change = 1; } lograt = R::dnorm(a, 0.0, 1.0, 1.0) - R::dnorm(b, 0.0, 1.0, 1.0); if(lograt <= logt2) z = unif_rs(a,b); else if((lograt > logt1) && (a < t3)) z = half_norm_rs(a,b); else z = exp_rs(a,b); if(change) z = -z; } double output; output = sigma*z + mu; return (output); } // rtnm( mu, sigma, lower, upper, cores) // // generates one random normal RVs with mean 'mu' and standard // deviation 'sigma', truncated to the interval (lower,upper), where // lower can be -Inf and upper can be Inf. // mu, sigma, lower, upper are vectors, and vectorized calls of this function // speed up computation // cores is an intege, representing the number of cores to be used in parallel //====================================================================== // [[Rcpp::export]] Rcpp::NumericVector rtnm(Rcpp::NumericVector mus, Rcpp::NumericVector sigmas, Rcpp::NumericVector lower, Rcpp::NumericVector upper, int cores){ omp_set_num_threads(cores); int nobs = mus.size(); Rcpp::NumericVector out(nobs); double logt1 = log(0.150), logt2 = log(2.18), t3 = 0.725; double a,b, z, tmp, lograt; int change; #pragma omp parallel for schedule(dynamic) for(int i=0;i<nobs;i++) { a = (lower(i) - mus(i))/sigmas(i); b = (upper(i) - mus(i))/sigmas(i); change=0; // First scenario if( (a == R_NegInf) || (b == R_PosInf)) { if(a == R_NegInf) { change = 1; a = -b; b = R_PosInf; } // The two possibilities for this scenario if(a <= 0.45) z = norm_rs(a, b); else z = exp_rs(a, b); if(change) z = -z; } // Second scenario else if((a * b) <= 0.0) { // The two possibilities for this scenario if((R::dnorm(a, 0.0, 1.0,1.0) <= logt1) || (R::dnorm(b, 0.0, 1.0, 1.0) <= logt1)) { z = norm_rs(a, b); } else z = unif_rs(a,b); } // Third scenario else { if(b < 0) { tmp = b; b = -a; a = -tmp; change = 1; } lograt = R::dnorm(a, 0.0, 1.0, 1.0) - R::dnorm(b, 0.0, 1.0, 1.0); if(lograt <= logt2) z = unif_rs(a,b); else if((lograt > logt1) && (a < t3)) z = half_norm_rs(a,b); else z = exp_rs(a,b); if(change) z = -z; } out(i)=sigmas(i)*z + mus(i); } return(out); } 

And this mark:

 libs=c("truncnorm","msm","inline","Rcpp","RcppArmadillo","rbenchmark") if( sum(!(libs %in% .packages(all.available = TRUE)))>0){ install.packages(libs[!(libs %in% .packages(all.available = TRUE))])} for(i in 1:length(libs)) {library(libs[i],character.only = TRUE,quietly=TRUE)} #needed for openMP parallel Sys.setenv("PKG_CXXFLAGS"="-fopenmp") Sys.setenv("PKG_LIBS"="-fopenmp") #no of cores for openMP version cores = 4 #surce code from same dir Rcpp::sourceCpp('truncnorm.cpp') #sample size nn=1000000 bb= 100 aa=-100 benchmark( rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),cores), rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),1),rtnorm(nn,rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn)),rtruncnorm(nn, a=aa, b=100, mean = 0, sd = 1) , order="relative", replications=3 )[,1:4] aa=0 benchmark( rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),cores), rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),1),rtnorm(nn,rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn)),rtruncnorm(nn, a=aa, b=100, mean = 0, sd = 1) , order="relative", replications=3 )[,1:4] aa=2 benchmark( rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),cores), rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),1),rtnorm(nn,rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn)),rtruncnorm(nn, a=aa, b=100, mean = 0, sd = 1) , order="relative", replications=3 )[,1:4] aa=50 benchmark( rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),cores), rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),1),rtnorm(nn,rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn)),rtruncnorm(nn, a=aa, b=100, mean = 0, sd = 1) , order="relative", replications=3 )[,1:4] 

Several benchmarks are needed as speed depends on the upper / lower bounds. In different cases, different parts of the algorithm are inserted.

+6
source share
1 answer

Really quick comments:

  • if you enabled RcppArmadillo.h , you do not need to include Rcpp.h - in fact, you should not and we even check that

  • rep(oneDraw, n) makes n calls. I would write a function that will be called once, which will return n draws to you - it will be faster when you save your office utility calls n-1

  • Your comment on the set of statistical distributions is based on the NumericVector type, when I wanted to work with doubling, it could reveal some misunderstandings: NumericVector is our convenient proxy class for internal R types: without copies, you can use std::vector<double> or any the form you prefer.

  • I know little about truncated normals, so I can not comment on the features of your algorithms.

  • Once you have developed it, consider posting to Rcpp Gallery .

+3
source

Source: https://habr.com/ru/post/950488/


All Articles