Using stat_summary_hex to show the most frequent value with a discrete color scale

I have a data frame with 10k rows and 3 columns: xpos, ypos and cluster (the cluster is a number from 0 to 9): http://pastebin.com/NyQw29tb

I would like to show a hexagonal plot with each hexagon colored along the most part of the cluster in that hexagon.

So far I have received:

library(ggplot2) library(hexbin) ggplot(clusters, aes(x=xpos, y=ypos, z=cluster)) + stat_summary_hex(fun.x=mode) 

I think this gives me what I want (i.e. fills each hexagon with a color from 0 to 9), but the color scheme becomes continuous, and I cannot figure out how to make it discrete.

output

For added context, here's the basic, messy kind of data I'm trying to smooth out using hexagons:

  qplot(data=clusters, xpos, ypos, color=factor(cluster)) 

output2

+4
source share
2 answers

I don't know what your stat_summary_hex(fun.x=mode) does, but I'm sure this is not what you think ( mode gives the object storage mode, not the statistical mode, and fun.x doesn't match any formal stat_summary_hex argument). Try it. It displays the results of observations in each hopper and pulls out the maximum quantity label.

 ggplot(clusters, aes(x=xpos, y=ypos, z=cluster)) + stat_summary_hex(fun = function(x) { tab <- table(x) names(tab)[which.max(tab)] }) 

Hexbinned clusters

+4
source

I believe that there are two problems. Firstly, mode is not the function you want (check the help - this is "Get or set the type or storage mode of the object"). Secondly, the parameter fun= , not fun.x= for stat_summary_hex .

It's nice to discuss mode features here . Recommended Feature:

 Mode <- function(x) { ux <- unique(x) ux[which.max(tabulate(match(x, ux)))] } 

Finally, you want to make sure that the filling of the hexagons is considered as a discrete value. You can change the fun function so that the return value is a character (as in the code below).

Here is an example of reproducibility:

 library(ggplot2) library(hexbin) Mode <- function(x) { ux <- unique(x) ux[which.max(tabulate(match(x, ux)))] } clusters=data.frame(xpos=rnorm(1000),ypos=rnorm(1000),cluster=rep(1:9,length.out=100)) ggplot(clusters, aes(x=xpos, y=ypos, z=cluster)) + stat_summary_hex(fun=function(x){as.character(Mode(x))}) 

Hope this helps.

+1
source

Source: https://habr.com/ru/post/1488746/


All Articles