A histogram showing the proportion of a factor variable taking a specific value

Question

A histogram showing the proportion of a factor variable taking a specific value

I have a dataset that looks like

df <- data.frame(cbind( c(rep.int(x = 0, times =7), 1:3), c(1, 1, 1, 0, 1, 0, 1, 1, 0, 0), c(1:3, 1:3, 1:3, NA))) names(df) <- c("cars", "sex", "status") df$sex <- factor(df$sex, labels = c("male", "female")) df$status <- factor(df$status, labels = c("bad", "ok", "good")) df$car <- (df$cars > 0) # Person has at least 1 car

I would like to use ggplot2 to create faceted histograms with the following characteristics:

Border by categorical variables (gender and status in this example)
Each panel contains one bar per level of this factor (for example, men and women for "gender").
Each bar shows how many percent of total observations for this level of this factor have at least 1 car (for example, the percentage of men with at least 1 car).

How can I do this seamlessly in ggplot2? (Or, alternatively, do you have a better suggestion on how to present these proportions on a graph?)

+4

r ggplot2

Benjamin allévius Feb 13 '12 at 21:33

source share

1 answer

John colby · Accepted Answer · 2012-02-13T22:41:27+0000

 library(ggplot2) df.long = melt(df, measure.vars=c('sex', 'status')) df.long.summary = ddply(df.long, .(variable, value), summarize, cars=sum(cars > 0) / length(cars)) ggplot(data=df.long.summary, aes(x=value, y=cars)) + geom_bar(stat='identity') + facet_wrap(~variable, scales='free_x') + scale_y_continuous(formatter='percent')

enter image description here

(BTW is even a little easier in the next version of ggplot2 , since there is no need to calculate the summary manually, because you can automatically limit the range of the chart to summary information instead of raw data)

A histogram showing the proportion of a factor variable taking a specific value

More articles: