My dataset:
I have data in the following format (here, imported from a CSV file). You can find an example dataset like CSV here .
PAIR PREFERENCE 1 5 1 3 1 2 2 4 2 1 2 3
... etc. There are 19 pairs in total, and PREFERENCE ranges from 1 to 5 as discrete values.
What I'm trying to achieve:
I need a histogram with accumulation, for example. 100% column for each pair, indicating the distribution of PREFERENCE values.
Something similar to "100% stack columns" in Excel or (although not quite the same thing, the so-called "mosaic plot"):

What I tried:
I realized that this would be easiest with ggplot2 , but I don’t even know where to start. I know that I can create a simple histogram with something like:
ggplot(d, aes(x=factor(PAIR), y=factor(PREFERENCE))) + geom_bar(position="fill")
... this, however, is not very far. So I tried this, and it brought me closer to what I'm trying to achieve, but it still uses the PREFERENCE count, I suppose? Note that ylab is "count" here, and values range from 19.
qplot(factor(PAIR), data=d, geom="bar", fill=factor(PREFERENCE_FIXED))
Results in:

- So what do I need to do to overlay the columns on the histogram?
- Or do they really do it already?
- If so, what do I need to change to get the correct labels (for example, have percentages instead of "count")?
By the way, this is not related to this issue and is only slightly related to it (i.e. probably the same idea, but not continuous values, are instead grouped into columns).