How to specify the color of lines and points in ecdf ggplot2

I have a dataset that is hard to visualize, but I think that ECDF with a few dots and lines added to it will do the trick. I can build things the way I want; my problem colors things correctly.

I have the following code that puts all the correct lines and points on a chart, but now I would like to color and mark everything correctly. I looked through several articles and tried a hundred things, but I can’t get it right. Do I need to format my data differently?

My vision of a legend looks something like this:

  • dashed line = b
  • solid line = a
  • red = s
  • blue = d
  • dot = s.mean

code to generate an example:

require(ggplot2) require(reshape2) sa = rnorm(100)*100 sb = rnorm(100)*100+50 da = -35 db = 20 sdata = data.frame(cbind(sa,sb)) ddata = data.frame(cbind(da,db)) sdata.m = melt(sdata) ddata.m = melt(ddata) ggplot(sdata.m, aes(x=value, color=variable)) + geom_vline(data=ddata.m, aes(xintercept = value, color=variable), linetype = 2, size=2) + stat_ecdf(size=1)+ labs(title = 'plotTitle', color='colorLegendTitle') + xlab('xLabel') + ylab('yLabel')+ theme_bw(30) + theme( legend.position=c(.8, .2), legend.box="horizontal", text=element_text(family="Times"), legend.key.size = unit(1,"cm")) + geom_point(x=mean(sdata.m$value[sdata.m$variable=="sa"]),y=.5, size = 5) + geom_point(x=mean(sdata.m$value[sdata.m$variable=="sb"]),y=.5, size = 5) 

enter image description here Some context for the data I draw: I have stochastic data sets and deterministic sets (d); each stochastic set will have hundreds of values, and deterministic sets have only one value. Therefore, in my plot, I compare the distribution of stochastic data (solid lines) and the average of stochastic data (points) with deterministic values ​​(dashed lines). For stochastic and deterministic data sets, there are two “cases” (a) and (b). I would like all (a) and (b) data to have the same color.

It seems like this should be easy with the aes and color / linetype / geom mappings, but I can't figure it out.

Thanks in advance.

+4
source share
2 answers

To get the best place for the legend color=variable and linetype=variable inside aes() for ggplot() and for geom_vline() - that means there will be one legend. Then for geom_point() put x and y inside aes() , as well as color="s.mean" and linetype="s.mean" . This will ensure that a new level is added to the legend. Now using scale_color"manual() and scale_linetype_manual() you can set the desired colors and line types. Using guides() and override.aes= you can remove points from the first four entries.

 ggplot(sdata.m, aes(x=value, color=variable,linetype=variable))+ stat_ecdf(size=1)+ geom_vline(data=ddata.m, aes(xintercept = value,color=variable,linetype=variable), size=2) + geom_point(aes(x=mean(sdata.m$value[sdata.m$variable=="sa"]), color="s.mean",linetype="s.mean",y=.5),size = 5) + geom_point(aes(x=mean(sdata.m$value[sdata.m$variable=="sb"]), color="s.mean",linetype="s.mean",y=.5),size = 5)+ scale_color_manual(breaks=c("da","db","sa","sb","s.mean"), values=c("blue","blue","red","red","green"))+ scale_linetype_manual(breaks=c("da","db","sa","sb","s.mean"), values=c(1,2,1,2,0))+ guides(color=guide_legend(override.aes=list(shape=c(NA,NA,NA,NA,16)))) 

enter image description here

+3
source

Didzis gets a response to the answer; I was able to adapt my code and get to the final product that I was looking for:

 ggplot(sdata.m, aes(x=value, color=variable,linetype=variable,shape=variable))+ stat_ecdf(size=1)+ geom_vline(data=ddata.m, aes(xintercept = value,color=variable,linetype=variable,shape=variable), size=2) + geom_point(aes(x=mean(sdata.m$value[sdata.m$variable=="sa"]), color="samean",linetype="samean",shape="samean", y=.5),size = 5) + geom_point(aes(x=mean(sdata.m$value[sdata.m$variable=="sb"]), color="sbmean",linetype="sbmean",shape="sbmean", y=.5),size = 5) + scale_shape_manual(breaks=c("da","db","sa","samean","sb","sbmean"), values=c(16,16,16,16,16,16)) + scale_color_manual(breaks=c("da","db","sa","samean","sb","sbmean"), values=c("blue","red","blue","blue","red","red"))+ scale_linetype_manual(breaks=c("da","db","sa","samean","sb","sbmean"), values=c(2,2,1,0,1,0))+ guides(color=guide_legend(override.aes=list(shape=c(NA,NA,NA,16,NA,16)))) 

enter image description here A few things I learned:

  • When adding breaks / values ​​to scale_manual, it is important to use the alphabetical order.
  • when all parameters (line type / shape / color) are matched with the same variable "variable", you can get everything in one legend
  • when redefining things using manual scales, you need to make one from each scale, and then redefine the "guides" if necessary

Thanks again Didzis. Another life saved.

+3
source

Source: https://habr.com/ru/post/1485525/


All Articles