The probability of a union of three or more sets

Consider the following sets of probabilities (three events are NOT mutually exclusive):

  • 0.05625 success, failure 0.94375
  • 0.05625 success, failure 0.94375
  • 0.05625 success, failure 0.94375

How to calculate the probability of at least one event (i.e., a union)?

If possible, I would prefer a general, standalone solution that could also handle 4 or more events. In this case, the answer I'm looking for is:

0.05625 + 0.05625 + 0.05625 - 0.05625*0.05625 - 0.05625*0.05625 - 0.05625*0.05625 + 0.05625*0.05625*0.05625 ##[1] 0.1594358 

My question is ultimately a bit wider than the name, since I am looking for functions that can calculate the probability of joining, intersecting ( 0.05625*0.05625*0.05625 = 0.0001779785 ), the event does not occur ( 1 - 0.1594358 = 0.8405642 ) or exactly one event ( 0.150300 ). In other words, the R-solution for this on-line conjunction is three event calculators . I have already examined the prob package, but it seems the interface is too complicated for such a simplified use case.

+5
source share
2 answers

Equal probabilities

You can get the probability that exactly 0, 1, 2 or 3 of them will happen with the binomial density function dbinom , which returns the probability of getting exactly the number of indicated successes (first argument) taking into account the total number of independent attempts (second argument) and the probability of success for of each attempt (third argument):

 dbinom(0:3, 3, 0.05625) # [1] 0.8405642090 0.1502995605 0.0089582520 0.0001779785 

So, if you need the probability of at least one event, that is:

 sum(dbinom(1:3, 3, 0.05625)) # [1] 0.1594358 

or

 1 - dbinom(0, 3, 0.05625) # [1] 0.1594358 

The dbinom function dbinom also solve your other issues. For example, the probability of everything that happens:

 dbinom(3, 3, 0.05625) # [1] 0.0001779785 

Probability of exactly one:

 dbinom(1, 3, 0.05625) # [1] 0.1502996 

Probability none:

 dbinom(0, 3, 0.05625) # [1] 0.8405642 

Unequal Probabilities - Some Simple Cases

If you have unequal probabilities stored in the vector p , and each element is selected independently, you need to do a little more work because the dbinom function dbinom not applied. However, some of the calculations are quite simple.

The probability that none of the selected elements will be simply the result of 1 minus the probabilities (the probability that at least one of them is selected is one minus):

 p <- c(0.1, 0.2, 0.3) prod(1-p) # [1] 0.504 

The probability of all is a product of the probabilities:

 prod(p) # [1] 0.006 

Finally, the probability that just one is selected is the sum over all elements of its probability, multiplied by the probability that all other elements will not be selected:

 sum(p * (prod(1-p) / (1-p))) # [1] 0.398 

Similarly, the probability of accurately choosing n-1 (where n is the number of probabilities):

 sum((1-p) * (prod(p) / p)) # [1] 0.092 

Unequal Probabilities - Complete Case

If you want the probability of every possible success count, one option would be to calculate all combinations of 2^n events (which A. Webb does in his answer). Instead, the following scheme is O (n ^ 2):

 cp.quadratic <- function(p) { P <- matrix(0, nrow=length(p), ncol=length(p)) P[1,] <- rev(cumsum(rev(p * prod(1-p) / (1-p)))) for (i in seq(2, length(p))) { P[i,] <- c(rev(cumsum(rev(head(p, -1) / (1-head(p, -1)) * tail(P[i-1,], -1)))), 0) } c(prod(1-p), P[,1]) } cp.quadratic(c(0.1, 0.2, 0.3)) # [1] 0.504 0.398 0.092 0.006 

Basically, we define P_ij as the probability that we have exactly i successes, all of which are in position j or more. The basic cases for i=0 and i=1 relatively simple to calculate, and then we have the following recurrence:

 P_ij = P_i(j+1) + p_j / (1-p_j) * P_(i-1)(j+1) 

In the cp.quadratic function cp.quadratic we loop with increasing i , filling in the matrix p (which is equal to n x n ). Therefore, the total operation counter is O (n ^ 2).

This allows you, for example, to calculate the distribution for a fairly large number of options per second:

 system.time(cp.quadratic(sample(c(.1, .2, .3), 100, replace=T))) # user system elapsed # 0.005 0.000 0.006 system.time(cp.quadratic(sample(c(.1, .2, .3), 1000, replace=T))) # user system elapsed # 0.165 0.043 0.208 system.time(cp.quadratic(sample(c(.1, .2, .3), 10000, replace=T))) # user system elapsed # 12.721 3.161 16.567 

We can calculate the distribution from 1,000 elements per split second and from 10,000 elements per minute; the calculation of 2 ^ 1000 or 2 ^ 10000 possible results would require an excessively long time (the number of subsets is 301-digit and 3010-digit, respectively).

+7
source

Here, a function that creates all combinations of events, calculates their probabilities and aggregates by the number of occurrences

 cp <- function(p) { ev <- do.call(expand.grid,replicate(length(p),0:1,simplify=FALSE)) pe <- apply(ev,1,function(x) prod(p*(x==1)+(1-p)*(x==0))) tapply(pe,rowSums(ev),sum) } 

An example is the same as josilber , using the probabilities of an event occurring independently of 0.1, 0.2 and 0.3:

 cp(c(0.1,0.2,0.3)) 
  0 1 2 3 
 0.504 0.398 0.092 0.006 

So, for example, the probability that exactly two independent events will be equal to 0.092.

+2
source

Source: https://habr.com/ru/post/1241946/


All Articles