Generate random integers between two values ​​with a given probability using R

I have the following four sets of numbers:

A=[1,207]; B=[208,386]; C=[387,486]; D=[487,586]. 

I need to generate 20,000 random numbers between 1 and 586, in which the probability that the generated number belongs to A is 1/2 and B, C, D is 1/6.

in which can I do this using R?

+4
source share
2 answers

You can directly use sample , more specifically the probs argument. Just divide the probability into all the numbers 586. Category A gets 0.5/207 each weight, etc.

 A <- 1:207 B <- 208:386 C <- 387:486 D <- 487:586 L <- sapply(list(A, B, C, D), length) x <- sample(c(A, B, C, D), size = 20000, prob = rep(c(1/2, 1/6, 1/6, 1/6) / L, L), replace = TRUE) 
+15
source

I would say using the roulette selection method. I will try to give a brief explanation here. Take a line 1 unit long. Now decompose this in proportion to the probability values. So in our case, the first part will be 1.2 in length, and the next three parts will be 1/6 in length. Now try a number from 0.1 to even distribution. Since all numbers have the same probability of occurrence, the number of samples belonging to the part will be equal to the length of the piece. Therefore, ever part of the number also belongs, a sample from this vector. (I will give you the R code below, you can run it for a huge amount to check if what I am saying is true. Maybe I will not explain it very well here.)

It is called roulette selection, because another analogy for the same situation may be to take a circle and divide it into sectors, where the angle of each sector is proportional to the probability values. Now again we number the number from the uniform distribution and see in which sector it falls, and the sample from this vector with the same probability

 A <- 1:207 B <- 208:386 C <- 387:486 D <- 487:586 cumList <- list(A,B,C,D) probVec <- c(1/2,1/6,1/6,1/6) cumProbVec <- cumsum(probVec) ret <- NULL for( i in 1:20000){ rand <- runif(1) whichVec <- which(rand < cumProbVec)[1] ret <- c(ret,sample(cumList[[whichVec]],1)) } #Testing the results length(which(ret %in% A)) # Almost 1/2*20000 of the values length(which(ret %in% B)) # Almost 1/6*20000 of the values length(which(ret %in% C)) # Almost 1/6*20000 of the values length(which(ret %in% D)) # Almost 1/6*20000 of the values 
+1
source

Source: https://habr.com/ru/post/1485192/


All Articles