Adding a density line to a histogram with counting data in ggplot2

Question

Adding a density line to a histogram with counting data in ggplot2

I want to add a density line (actually normal density) to the histogram.

Suppose I have the following data. I can build a ggplot2 histogram:

 set.seed(123) df <- data.frame(x = rbeta(10000, shape1 = 2, shape2 = 4)) ggplot(df, aes(x = x)) + geom_histogram(colour = "black", fill = "white", binwidth = 0.01)

enter image description here

I can add a density line using:

 ggplot(df, aes(x = x)) + geom_histogram(aes(y = ..density..),colour = "black", fill = "white", binwidth = 0.01) + stat_function(fun = dnorm, args = list(mean = mean(df$x), sd = sd(df$x)))

enter image description here

But this is not what I really want, I want this density line to be tied to the count data.

I found a similar entry ( HERE ) that suggested a solution to this problem. But in my case, this did not work. I need an arbitrary expansion factor to get what I want. And this is not generalized at all:

 ef <- 100 # Expansion factor ggplot(df, aes(x = x)) + geom_histogram(colour = "black", fill = "white", binwidth = 0.01) + stat_function(fun = function(x, mean, sd, n){ n * dnorm(x = x, mean = mean, sd = sd)}, args = list(mean = mean(df$x), sd = sd(df$x), n = ef))

Any tips I can use to summarize this

first to normal distribution,
then to any other hopper size,
and finally, any other distribution will be very useful.

+6

r ggplot2 histogram density-plot

Hbat Dec 26 '14 at 20:51

source share

1 answer

jlhoward · Accepted Answer · 2014-12-26T21:56:45+0000

The distribution function is not set by magic. You must do this explicitly. One way is to use fitdistr(...) in the MASS package.

 library(MASS) # for fitsidtr(...) # excellent fit (of course...) ggplot(df, aes(x = x)) + geom_histogram(aes(y=..density..),colour = "black", fill = "white", binwidth = 0.01)+ stat_function(fun=dbeta,args=fitdistr(df$x,"beta",start=list(shape1=1,shape2=1))$estimate)

 # horrible fit - no surprise here ggplot(df, aes(x = x)) + geom_histogram(aes(y=..density..),colour = "black", fill = "white", binwidth = 0.01)+ stat_function(fun=dnorm,args=fitdistr(df$x,"normal")$estimate)

 # mediocre fit - also not surprising... ggplot(df, aes(x = x)) + geom_histogram(aes(y=..density..),colour = "black", fill = "white", binwidth = 0.01)+ stat_function(fun=dgamma,args=fitdistr(df$x,"gamma")$estimate)

EDIT : response to OP comment.

The scale factor is binwidth ✕ sample size.

 ggplot(df, aes(x = x)) + geom_histogram(colour = "black", fill = "white", binwidth = 0.01)+ stat_function(fun=function(x,shape1,shape2)0.01*nrow(df)*dbeta(x,shape1,shape2), args=fitdistr(df$x,"beta",start=list(shape1=1,shape2=1))$estimate)

Adding a density line to a histogram with counting data in ggplot2

More articles: