Erroneous seed behavior with rbinom (prob = 0.5) in R

I found what I consider unstable behavior (but for which I hope there is a simple explanation) in R using seeds in combination with rbinom() when prob=0.5 . General idea. For me, if I set the seed, run rbinom() once (i.e., execute one random process), although prob , the random seed should change by one step. Then, if I set the seed to the same value again and start another random process (for example, rbinom() again, but possibly with a different prob value), the seed should change again to the same value as for the previous one single random process.

I found that R does exactly this while I use rbinom() with any prob!=0.5 . Here is an example:

Compare the semester vector, .Random.seed , for two probabilities other than 0.5:

 set.seed(234908) x <- rbinom(n=1,size=60,prob=0.4) temp1 <- .Random.seed set.seed(234908) x <- rbinom(n=1,size=60,prob=0.3) temp2 <- .Random.seed any(temp1!=temp2) > [1] FALSE 

Compare sample vector, .Random.seed , for prob = 0.5 vs. prob! = 0.5:

 set.seed(234908) x <- rbinom(n=1,size=60,prob=0.5) temp1 <- .Random.seed set.seed(234908) x <- rbinom(n=1,size=60,prob=0.3) temp2 <- .Random.seed any(temp1!=temp2) > [1] TRUE temp1==temp2 > [1] TRUE FALSE TRUE TRUE TRUE TRUE TRUE > [8] TRUE TRUE TRUE TRUE TRUE TRUE TRUE ... 

I found this for all comparisons prob=0.5 against all other probabilities in the set {0,1, 0,2, ..., 0,9}. Similarly, if I compare any prob values ​​with {0.1, 0.2, ..., 0.9} except 0.5, the .Random.seed vector .Random.seed always incremental. These facts are true for odd or even size within rbinom() .

To make it even weirder (I'm sorry this is a bit confusing - this has to do with how my function is written) when I use probabilities stored as elements in a vector, I have the same problem if 0.5 is the first element, but not the second. Here is an example for this case:

First case: 0.5 - the first probability the vector refers to

 set.seed(234908) MNAR <- c(0.5,0.3) x <- rbinom(n=1,size=60,prob=MNAR[1]) y <- rbinom(n=1,size=50,prob=MNAR[2]) temp1 <- .Random.seed set.seed(234908) MNAR <- c(0.1,0.3) x <- rbinom(n=1,size=60,prob=MNAR[1]) y <- rbinom(n=1,size=50,prob=MNAR[2]) temp2 <- .Random.seed any(temp1!=temp2) > [1] TRUE any(temp1!=temp2) > [1] TRUE FALSE TRUE TRUE TRUE TRUE TRUE > [8] TRUE TRUE TRUE TRUE TRUE TRUE TRUE 

The second case: 0.5 - the second probability indicated in the vector

 set.seed(234908) MNAR <- c(0.3,0.5) x <- rbinom(n=1,size=60,prob=MNAR[1]) y <- rbinom(n=1,size=50,prob=MNAR[2]) temp1 <- .Random.seed set.seed(234908) MNAR <- c(0.1,0.3) x <- rbinom(n=1,size=60,prob=MNAR[1]) y <- rbinom(n=1,size=50,prob=MNAR[2]) temp2 <- .Random.seed any(temp1!=temp2) > [1] FALSE 

Again, I believe that despite the values ​​used for prob and size , this template is preserved. Can anyone explain this secret to me? This causes a problem because the results, which should be the same, are different, because for some reason the seed is used / calculated differently when prob=0.5 , but in no other instance.

+42
random r
Sep 20 '13 at 1:35
source share
2 answers

So, let's answer our comments. Thanks to Ben Bolker for putting us on the right path with a link to the code: https://svn.r-project.org/R/trunk/src/nmath/rbinom.c and a suggestion to track where unif_rand() is unif_rand() .

A quick scan, and it seems that the code is divided into two sections, limited by comments:

 /*-------------------------- np = n*p >= 30 : ------------------- */ 

and

 /*---------------------- np = n*p < 30 : ------------------------- */ 

Inside each of them, the number of unif_rand calls unif_rand not match (two versus one).

So, for a given size ( n ), your random seed may be in a different state depending on the value of prob ( p ): size * prob >= 30 or not.

With this in mind, all the results that you got with your examples should now make sense:

 # these end up in the same state rbinom(n=1,size=60,prob=0.4) # => np < 30 rbinom(n=1,size=60,prob=0.3) # => np < 30 # these don't rbinom(n=1,size=60,prob=0.5) # => np >= 30 rbinom(n=1,size=60,prob=0.3) # => np < 30 # these don't {rbinom(n=1,size=60,prob=0.5) # np >= 30 rbinom(n=1,size=50,prob=0.3)} # np < 30 {rbinom(n=1,size=60,prob=0.1) # np < 30 rbinom(n=1,size=50,prob=0.3)} # np < 30 # these do {rbinom(n=1,size=60,prob=0.3) # np < 30 rbinom(n=1,size=50,prob=0.5)} # np < 30 {rbinom(n=1,size=60,prob=0.1) # np < 30 rbinom(n=1,size=50,prob=0.3)} # np < 30 
+38
Sep 20 '13 at 4:08
source share

I am going to take the opposite position on this issue and say that expectations are not suitable and are not supported by the documentation. There is no claim in the documentation as to which side effects (in particular on .Random.seed ) can be expected by calling rbinom or how these side effects may or may not be the same in different cases.

rbinom has three parameters: n , size and prob . Your expectation is that for a random seed set before calling rbinom , .Random.seed will be the same after calling rbinom for given n and any size and prob (or maybe any final size and prob ). You, of course, understand that for different values ​​of n this will be different. rbinom does not guarantee this or imply it.

Without knowing the internal functions, it is impossible to know; as another answer showed, the algorithm is different from the size and prob . And the insides can change so that these specific details can change.

At least in this case, the result of .Random.seed will be the same after every rbinom call that has the same n , size and prob . I can build a pathological function for which it is not:

 seedtweak <- function() { if(floor(as.POSIXlt(Sys.time())$sec * 10) %% 2) { runif(1) } invisible(NULL) } 

Basically, this function looks to see if a tenth of the second time is odd or even decide whether to draw a random number or not. Run this function and .Random.seed may or may not change:

 rs <- replicate(10, { set.seed(123) seedtweak() .Random.seed }) all(apply(rs, 1, function(x) Reduce(`==`, x))) 

We hope that best of all you can (should you?) Hope that this set of codes with all inputs / parameters is the same (including seed) will always give the same results. Waiting for identical results when only most (or only some) of the parameters are the same is unrealistic if all the functions called these guarantees do not provide these guarantees.

+15
Sep 20 '13 at 16:16
source share



All Articles