Adding a density line to a histogram with counting data in ggplot2

I want to add a density line (actually normal density) to the histogram.

Suppose I have the following data. I can build a ggplot2 histogram:

 set.seed(123) df <- data.frame(x = rbeta(10000, shape1 = 2, shape2 = 4)) ggplot(df, aes(x = x)) + geom_histogram(colour = "black", fill = "white", binwidth = 0.01) 

enter image description here

I can add a density line using:

 ggplot(df, aes(x = x)) + geom_histogram(aes(y = ..density..),colour = "black", fill = "white", binwidth = 0.01) + stat_function(fun = dnorm, args = list(mean = mean(df$x), sd = sd(df$x))) 

enter image description here

But this is not what I really want, I want this density line to be tied to the count data.

I found a similar entry ( HERE ) that suggested a solution to this problem. But in my case, this did not work. I need an arbitrary expansion factor to get what I want. And this is not generalized at all:

 ef <- 100 # Expansion factor ggplot(df, aes(x = x)) + geom_histogram(colour = "black", fill = "white", binwidth = 0.01) + stat_function(fun = function(x, mean, sd, n){ n * dnorm(x = x, mean = mean, sd = sd)}, args = list(mean = mean(df$x), sd = sd(df$x), n = ef)) 

enter image description here

Any tips I can use to summarize this

  • first to normal distribution,
  • then to any other hopper size,
  • and finally, any other distribution will be very useful.
+6
source share
1 answer

The distribution function is not set by magic. You must do this explicitly. One way is to use fitdistr(...) in the MASS package.

 library(MASS) # for fitsidtr(...) # excellent fit (of course...) ggplot(df, aes(x = x)) + geom_histogram(aes(y=..density..),colour = "black", fill = "white", binwidth = 0.01)+ stat_function(fun=dbeta,args=fitdistr(df$x,"beta",start=list(shape1=1,shape2=1))$estimate) 

 # horrible fit - no surprise here ggplot(df, aes(x = x)) + geom_histogram(aes(y=..density..),colour = "black", fill = "white", binwidth = 0.01)+ stat_function(fun=dnorm,args=fitdistr(df$x,"normal")$estimate) 

 # mediocre fit - also not surprising... ggplot(df, aes(x = x)) + geom_histogram(aes(y=..density..),colour = "black", fill = "white", binwidth = 0.01)+ stat_function(fun=dgamma,args=fitdistr(df$x,"gamma")$estimate) 

EDIT : response to OP comment.

The scale factor is binwidth ✕ sample size.

 ggplot(df, aes(x = x)) + geom_histogram(colour = "black", fill = "white", binwidth = 0.01)+ stat_function(fun=function(x,shape1,shape2)0.01*nrow(df)*dbeta(x,shape1,shape2), args=fitdistr(df$x,"beta",start=list(shape1=1,shape2=1))$estimate) 

+10
source

Source: https://habr.com/ru/post/980186/


All Articles