Is there a more efficient method than loops for something that requires conditional validation?

I have a problem related to the fact that I wrap a while loop around code that I believe can be efficiently vectorized. However, at each step, my stopping condition depends on the value at this stage. Consider this example as a representative model of my problem:
Generate random variables N (0,1) using rnorm() until you select a value greater than an arbitrary value of k .

EDIT. The caveat of my problem discussed in the comments is that I do not know, a priori, a good approximation of how many samples need to be taken before my stopping condition.

One approach:

  • Using a while loop, sample appropriate standard random vectors (e.g. rnorm(50) to fetch 50 standard normals at a time or rnorm(1) if k is close to zero). Check this vector to see if there are more observations than k.

  • If so, stop and return all previous values. Otherwise, combine your vector from step 1 with the new vector that you do by repeating step 1.

Another approach would be to indicate the total number of excess random draws for a given k. This may mean that if k = 2, enter 1000 ordinary random variables using rnorm(1000) .

Using the vectorization that R offers in the second case gives faster results than the loop version in cases where the number of overflows is not much higher than necessary, but in my problem I do not have a good intuition about how much I need to do, so I need to be conservative.

The question is: is there a way to perform a highly integrated procedure, such as method 2, but using conditional validation, such as method 1? Does small vectorized operations like rnorm(50) β€œfastest” way, assuming the highly integrated method is element by element faster but more wasteful?

+6
source share
1 answer

Here is the implementation of my previous sentence: use your first approach, but increase the number of new samples between each iteration, for example, instead of 50 new samples at each iteration, multiply this number by two between each iteration: 50 , then 100 , 200 , 400 , etc. d.

With your sample size after a diverging geometric series, you are guaranteed to get out of "several" iterations.

 sample.until.thresh <- function(FUN, exit.thresh, sample.start = 50, sample.growth = 2) { sample.size <- sample.start all.values <- list() num.iterations <- 0L repeat { num.iterations <- num.iterations + 1L sample.values <- FUN(sample.size) all.values[[num.iterations]] <- sample.values above.thresh <- sample.values > exit.thresh if (any(above.thresh)) { first.above <- match(TRUE, above.thresh) all.values[[num.iterations]] <- sample.values[1:first.above] break } sample.size <- sample.size * sample.growth } all.values <- unlist(all.values) return(list(num.iterations = num.iterations, sample.size = length(all.values), sample.values = all.values)) } set.seed(123456L) res <- sample.until.thresh(rnorm, 5) res$num.iterations # [1] 16 res$sample.size # [1] 2747703 
+1
source

Source: https://habr.com/ru/post/913744/


All Articles