A histogram showing the proportion of a factor variable taking a specific value

I have a dataset that looks like

df <- data.frame(cbind( c(rep.int(x = 0, times =7), 1:3), c(1, 1, 1, 0, 1, 0, 1, 1, 0, 0), c(1:3, 1:3, 1:3, NA))) names(df) <- c("cars", "sex", "status") df$sex <- factor(df$sex, labels = c("male", "female")) df$status <- factor(df$status, labels = c("bad", "ok", "good")) df$car <- (df$cars > 0) # Person has at least 1 car 

I would like to use ggplot2 to create faceted histograms with the following characteristics:

  • Border by categorical variables (gender and status in this example)
  • Each panel contains one bar per level of this factor (for example, men and women for "gender").
  • Each bar shows how many percent of total observations for this level of this factor have at least 1 car (for example, the percentage of men with at least 1 car).

How can I do this seamlessly in ggplot2? (Or, alternatively, do you have a better suggestion on how to present these proportions on a graph?)

+4
source share
1 answer
 library(ggplot2) df.long = melt(df, measure.vars=c('sex', 'status')) df.long.summary = ddply(df.long, .(variable, value), summarize, cars=sum(cars > 0) / length(cars)) ggplot(data=df.long.summary, aes(x=value, y=cars)) + geom_bar(stat='identity') + facet_wrap(~variable, scales='free_x') + scale_y_continuous(formatter='percent') 

enter image description here

(BTW is even a little easier in the next version of ggplot2 , since there is no need to calculate the summary manually, because you can automatically limit the range of the chart to summary information instead of raw data)

+4
source

Source: https://habr.com/ru/post/1396290/


All Articles