UPDATE:
I tried to implement the Dirk suggestions. Comments? I'm busy right now at JSM, but I would like to get some feedback before knitting Rmd for the gallery. I switched from Armadillo to regular Rcpp, as it added no value. The scalar versions with R :: are pretty nice. I have to add the parameter n for the number of draws if the average / sd is entered as a scalar, and not as vectors of the desired output length.
There are many MCMC applications that require drawing samples from truncated normal distributions. I built on an existing TN implementation and added parallel computing to it.
Questions:
- Anyone see further potential speed improvements?
In the latter case, the rtruncnorm test is sometimes faster. Rcpp implementation is always faster than existing packages, but can it be improved even further? - I ran it in a complex model that I cannot separate, and my R session crashed. However, I cannot systematically reproduce it, so it could be another part of the code. If someone works with TN, please check it out and let me know. Update: I had no problems with the updated code, but let me know.
How I put things together: As far as I know, the fastest implementation does not apply to CRAN, but the source code can be downloaded stat OSU . Competing implementations in msm and truncorm were slower in my tests. The trick is to effectively regulate the distribution of sentences, where it works exponentially for truncated Normal tails. So I took Chris's code, "Rcpp'ed", and added some openMP grease to it. A dynamic schedule is optimal here, since sampling may take more or less time depending on the boundaries. One thing I found nasty: a lot of statistical distributions are based on the NumericVector type when I wanted to work with doubling. I just encoded this method.
Here is the Rcpp code:
#include <Rcpp.h>
And this mark:
libs=c("truncnorm","msm","inline","Rcpp","RcppArmadillo","rbenchmark") if( sum(!(libs %in% .packages(all.available = TRUE)))>0){ install.packages(libs[!(libs %in% .packages(all.available = TRUE))])} for(i in 1:length(libs)) {library(libs[i],character.only = TRUE,quietly=TRUE)} #needed for openMP parallel Sys.setenv("PKG_CXXFLAGS"="-fopenmp") Sys.setenv("PKG_LIBS"="-fopenmp") #no of cores for openMP version cores = 4 #surce code from same dir Rcpp::sourceCpp('truncnorm.cpp') #sample size nn=1000000 bb= 100 aa=-100 benchmark( rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),cores), rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),1),rtnorm(nn,rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn)),rtruncnorm(nn, a=aa, b=100, mean = 0, sd = 1) , order="relative", replications=3 )[,1:4] aa=0 benchmark( rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),cores), rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),1),rtnorm(nn,rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn)),rtruncnorm(nn, a=aa, b=100, mean = 0, sd = 1) , order="relative", replications=3 )[,1:4] aa=2 benchmark( rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),cores), rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),1),rtnorm(nn,rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn)),rtruncnorm(nn, a=aa, b=100, mean = 0, sd = 1) , order="relative", replications=3 )[,1:4] aa=50 benchmark( rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),cores), rtnm(rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn),1),rtnorm(nn,rep(0,nn),rep(1,nn),rep(aa,nn),rep(100,nn)),rtruncnorm(nn, a=aa, b=100, mean = 0, sd = 1) , order="relative", replications=3 )[,1:4]
Several benchmarks are needed as speed depends on the upper / lower bounds. In different cases, different parts of the algorithm are inserted.