How to build the difference between two ggplot density distributions?

Question

How to build the difference between two ggplot density distributions?

I would like to use ggplot2 to illustrate the difference between two identical density distributions. Here is an example toy for the data type that I have:

library(ggplot2)

# Make toy data
n_sp  <- 100000
n_dup <- 50000
D <- data.frame( 
    event=c(rep("sp", n_sp), rep("dup", n_dup) ), 
    q=c(rnorm(n_sp, mean=2.0), rnorm(n_dup, mean=2.1)) 
)

# Standard density plot
ggplot( D, aes( x=q, y=..density.., col=event ) ) +
    geom_freqpoly()

Instead of the individual components of the density for each category ( dupand sp) as described above, I could build a separate line, which shows the difference between these distributions?

, dup sp, ( sp) 0 ( dup). , dup sp.

- ?

+4

r plot ggplot2

Casey Dunn 02 . '17 19:19

1

alistaire · Answer 1 · 2017-03-02T20:03:30+0000

ggplot , . density q , y. dplyr ( R data.table, ),

library(dplyr)
library(ggplot2)

D %>% group_by(event) %>% 
    # calculate densities for each group over same range; store in list column
    summarise(d = list(density(q, from = min(.$q), to = max(.$q)))) %>% 
    # make a new data.frame from two density objects
    do(data.frame(x = .$d[[1]]$x,    # grab one set of x values (which are the same)
                  y = .$d[[1]]$y - .$d[[2]]$y)) %>%    # and subtract the y values
    ggplot(aes(x, y)) +    # now plot
    geom_line()

How to build the difference between two ggplot density distributions?

More articles: