Fast generation of ~ 10 ^ 9 steps of a random process in R

Question

Fast generation of ~ 10 ^ 9 steps of a random process in R

I have the following task:

Generate 10 ^ 9 steps of the process described by the formula:
X(0)=0 X(t+1)=X(t)+Y(t) 
where Y(t) are independent random variables with a distribution of N(0,1) . Calculate at what percentage of indices t value of X(t) was negative.

I tried the following code:

  x<-c(0,0) z<-0 loop<-10^9 for(i in 2:loop) { x[1]<-x[2] x[2]<-x[1]+rnorm(1, 0, 1) if (x[2]<0) {z<-z+1} }

However, it is very slow. How can I speed it up?

+5

performance loops r rcpp random-walk

wojciesz Dec 23 '17 at 15:11

source share

5 answers

In general, for such tasks, you can translate your function one on one in C ++ using the Rcpp package. This should give significant acceleration.

Firstly, version R:

 random_sum <- function(loop = 1000) { x<-c(0,0) z<-0 for(i in 2:loop) { x[1]<-x[2] x[2]<-x[1]+rnorm(1, 0, 1) if (x[2]<0) {z<-z+1} } z / loop } set.seed(123) random_sum() # [1] 0.134

Now C ++ version:

 library("Rcpp") cppFunction(" double random_sum_cpp(unsigned long loop = 1000) { double x1 = 0; double x2 = 0; double z = 0; for (unsigned long i = 2; i < loop; i++) { x1 = x2; x2 = x1 + Rcpp::rnorm(1)[0]; if (x2 < 0) z = z+1; } return z/loop; }") set.seed(123) random_sum_cpp() # [1] 0.134

For completeness, we also consider the proposed vector version:

 random_sum_vector <- function(loop = 1000) { Y = rnorm(loop) sum(cumsum(Y)<0)/loop } set.seed(123) random_sum_vector() # [1] 0.134

We see that it gives the same result for the same random seed, so it seems like a viable rival.

In the reference version, the C ++ version and the vectorized version are performed in a similar way, with the vectorized version showing a slight advantage over the C ++ version:

 > microbenchmark(random_sum(100000), random_sum_vector(100000), random_sum_cpp(100000)) Unit: milliseconds expr min lq mean median uq max neval random_sum(1e+05) 184.205588 199.859266 209.220232 205.137043 211.026740 274.47615 100 random_sum_vector(1e+05) 6.320690 6.631704 7.273645 6.799093 7.334733 18.48649 100 random_sum_cpp(1e+05) 8.950091 9.362303 10.663295 9.956996 11.079513 21.30898 100

However, the vector version speeds up memory work and will explode your memory for long cycles. In C ++, there is practically no memory.

For 10 ^ 9 steps, the C ++ version works for about 2 minutes (110 seconds) on my machine. I have not tried version R. Based on shorter tests, this will probably take about 7 hours.

 > microbenchmark(random_sum_cpp(10^9), times = 1) Unit: seconds expr min lq mean median uq max neval random_sum_cpp(10^9) 110.2182 110.2182 110.2182 110.2182 110.2182 110.2182 1

+9

Claus wilke Dec 23 '17 at 16:02

source share

It should be much faster, but a billion in all may take some time. It would be nice to check this with shorter lengths - for example, 10 ^ 6.

 length = 10^9 Y = rnorm(length) sum(cumsum(Y)<0)/length

EDIT

Based on the comments of @ user3666197, I tested this and he was right. This solution works well for smaller numbers, but as soon as the number of steps gets too large, it fails.

I tested my "vectorized" version against OP code. When the random walk length was 10 ^ 8, my code took about 7 seconds, and the OP code took 131 seconds (on my laptop). However, when I increased the length to 10 ^ 9 (on the first question), my version caused a lot of disk exchange, and I had to kill the process. This solution fails on the scale requested by the OP.

+4

G5w Dec 23 '17 at 15:28

source share

Given that the source of randomness is technically constructed as the ability of otherwise determinate equipment to fulfill both the requirement of repeatability of the generated stream and all the conditions for the “generated” randomness using a certain pseudo-random generator algorithm, such a source of randomness is not easily converted from pure [SERIAL] to any form " fair "- [CONCURRENT] or true [PARALLEL] modus operandi.

This suggests that the PRG step is the central point (lock) for any attempt to override the execution of pure- [SERIAL] code.

This does not change the percentage of (non) -negative X(t) -values, but simply determines that for a given PRG hardware implementation, there is a shorter path, but a sequence of pure- [SERIAL] generation of mutually (sequentially) dependent values.

Deploying a "slow" cycle or quasi - ( _{since the values still depend on the sequence} ) - vectorized processing (R-language implementation options that use, but almost hardware tricks of the processor instruction set level - not a language changer, but bypasses some deliberately slow code execution constructors), which is most likely to happen.

0

user3666197 Dec 23 '17 at 15:32

source share

Using vectors tends to give better performance than for loops. The problem here with very large numbers (i.e. 10 ^ 9) is memory limitation. Since you are only interested in the final percentage of negative indicators, the following will work (it takes a few minutes at 10 ^ 9 steps).

 update_state <- function (curr_state, step_size) { n <- min(curr_state$counter, step_size) r <- rnorm(min(curr_state$counter, step_size)) total <- curr_state$cum_sum + cumsum(r) list('counter' = curr_state$counter - n, 'neg_count' = curr_state$neg_count + length(which(total < 0)), 'cum_sum' = curr_state$cum_sum + sum(r)) } n <- 10^9 curr_state <- list('counter' = n, 'neg_count' = 0, 'cum_sum' = 0) step_size <- 10^8 while (curr_state$counter > 0) { curr_state <- update_state(curr_state = curr_state, step_size = step_size) } print(curr_state) print(curr_state$neg_count/ n)

0

AlphaDrivers Dec 23 '17 at 17:47

source share

Eric Watt · Accepted Answer · 2017-12-23T17:34:36+0000

One solution is to switch with the vectorized sentence proposed by @ G5W, but break it into smaller pieces to avoid memory overflow problems. This gives you the speed of a vectorized solution, but by controlling the block size, you can control how much memory the process uses.

The following breaks the problem into blocks 1e + 07, and, sorting 100 times, you get the total number 1e + 09.

At the end of the first block, you record the percentage of time below 0 and the end point. The endpoint is then fed to the next block, and you record the percentage of time below 0 and the new endpoint.

In the end, an average of 100 runs to get the total amount of time below zero. cat calls in the while loop should track the progress and see the progression, this can be commented on.

 funky <- function(start, length = 1e+07) { Y <- rnorm(length) Z <- cumsum(Y) c(sum(Z<(-start))/length, (tail(Z, 1) + start)) } starttime <- Sys.time() resvect <- vector(mode = "numeric", length = 100) result <- funky(0) resvect[1] <- result[1] i <- 2 while (i < 101) { cat(result, "\n") result <- funky(result[2]) resvect[i] <- result[1] i <- i + 1 } mean(resvect) # [1] 0.1880392 endtime <- Sys.time() elapsed <- endtime - starttime elapsed # Time difference of 1.207566 mins

Fast generation of ~ 10 ^ 9 steps of a random process in R

EDIT

More articles: