R: How to get the sum of two distributions?

I have a simple question. I would like to summarize two nonparametric distributions.

Here is an example. There are two cities in which there are 10 houses. we know the energy consumption for every home. (edited) I want to get the probability distribution of the sum of a random house selected from each city.

A1 <- c(1,2,3,3,3,4,4,5,6,7) #10 houses' energy consumption for city A
B1 <- c(11,13,15,17,17,18,18,19,20,22) #10 houses' energy consumption for city B

I have a probability distribution of A1 and B1, how can I get a probability distribution of A1 + B1? If I just use A1+B1in R, it gives 12 15 18 20 20 22 22 24 26 29. However, I do not think this is correct. Of course, there is no order in the houses.

When I change the order of the houses, it gives different results.

# Original
A1 <- c(1,2,3,3,3,4,4,5,6,7)
B1 <- c(11,13,15,17,17,18,18,19,20,22)
#change order 1
A2 <- c(7,6,5,4,4,3,3,3,2,1) 
B2 <- c(22,20,19,18,18,17,17,15,13,11)
#change order 2
A3 <- c(3,3,3,4,4,5,6,7,1,2) 
B3 <- c(17,17,18,18,19,13,20,11,22,15)
sum1 <- A1+B1; sum1
sum2 <- A1+B2; sum2
sum3 <- A3+B3; sum3

enter image description here

The red lines are sum1, sum2 and sum3. I am not sure how I can get the distribution of the sum of the two distributions. Please give me any ideas. Thanks!

( , , )

+4
3

PDF , , :

PDF (Z) = PDF (Y) * PDF (X)

, , convolution.

# your data
A1 <- c(1,2,3,3,3,4,4,5,6,7) #10 houses' energy consumption for city A
B1 <- c(11,13,15,17,17,18,18,19,20,22) #10 houses' energy consumption for city B

# compute PDF/CDF
PDF_A1 <- table(A1)/length(A1)
CDF_A1 <- cumsum(PDF_A1)

PDF_B1 <- table(B1)/length(B1)
CDF_B1 <- cumsum(PDF_B1)

# compute the sum distribution 
PDF_C1 <- convolve(PDF_B1, PDF_A1, type = "open")

# plotting
plot(PDF_C1, type="l", axe=F, main="PDF of A1+B1")
box()
axis(2)
# FIXME: is my understand for X correct?
axis(1, at=seq(1:14), labels=(c(names(PDF_A1)[-1],names(PDF_B1))))

enter image description here

:

CDF:

PDF:

## To make the x-values correspond to actually sums, consider
## compute PDF
## pad zeros in probability vectors to convolve
r <- range(c(A1, B1))
pdfA <- pdfB <- vector('numeric', diff(r)+1L)
PDF_A1 <- table(A1)/length(A1)                        # same as what you have done
PDF_B1 <- table(B1)/length(B1)
pdfA[as.numeric(names(PDF_A1))] <- as.vector(PDF_A1)  # fill the values
pdfB[as.numeric(names(PDF_B1))] <- as.vector(PDF_B1)

## compute the convolution and plot
res <- convolve(pdfA, rev(pdfB), type = "open")
plot(res, type="h", xlab='Sum', ylab='')

enter image description here

## In this simple case (with discrete distribution) you can compare
## to previous solution
tst <- rowSums(expand.grid(A1, B1))
plot(table(tst) / sum(as.vector(table(tst))), type='h')

enter image description here

+5

Edit:

, @jeremycg, , , , , .

, A1 B1, , , . , "": . density, . ( ), density()$x, sample, prob=density()$y... .. x, .

, mean(A1) 3,8, mean(B1) 17, , , ~ 20,8. " -" /, , .

sample_sum <- function(A, B, n, ...){
    qss <- function(X, n, ...){
        r_X <- range(X)
        dens_X <- density(X, ...)
        sample(dens_X$x, size=n, prob=dens_X$y, replace=TRUE)
    }

    sample_A <- qss(A, n=n, ...)
    sample_B <- qss(B, n=n, ...)

    sample_A + sample_B
}

ss <- sample_sum(A1, B1, n=100, from=0)

png("~/Desktop/answer.png", width=5, height=5, units="in", res=150)
plot(density(ss))
dev.off()

, 0, , . , 20, " ".

, , . , .

, , , . , 10 , expand.grid, Error: cannot allocate vector of size 372529.0 Gb, sample_sum 0,12 .

, , ;)

enter image description here

+3

, - :

rowSums(expand.grid(A1, B1))

expand.grid, A1 B1, rowSums .

+2

Source: https://habr.com/ru/post/1619120/


All Articles