How to build parallel coordinates with several categorical variables in R

I ran into difficulty when plotting parallel coordinates using ggparcoord from the GGally package. Since there are two categorical variables, what I want to show when rendering is similar to the image below. I found that in ggparcoord , groupColumn allowed to specify one variable for grouping (color), and of course, I can use showPoints to mark the values โ€‹โ€‹along the axes, but I also need to change the shape of these markers according to categorical variables. Is there any other package that can help me realize my idea?

Any answer would be appreciated! Thanks!

university and country are two categories

+5
source share
1 answer

It is not difficult to roll your own graph of parallel coordinates in ggplot2, which will give you the flexibility to customize the aesthetics. Below is an illustration using the built-in diamonds data frame.

To get parallel coordinates, you need to add an ID column so that you can identify each row of the data frame that we will use as a group aesthetics in ggplot. You also need scale numerical values โ€‹โ€‹so that they are all on the same vertical scale when we plot them. Then you need to take all the columns that you want along the x axis and change them to โ€œlongโ€ format. We do it all on the fly below using the tidyverse/dplyr pipe operator.

Even after limiting the number of combinations of categories of lines, the lines are probably too intertwined for this graph to be easily interpreted, so consider this simply โ€œproof of conceptโ€. Hope you can create something more useful with your data. I used colour (for strings) and fill (for dots) aesthetics below. Instead, you can use shape or linetype , depending on your needs.

 library(tidyverse) theme_set(theme_classic()) # Get 20 random rows from the diamonds data frame after limiting # to two levels each of cut and color set.seed(2) ds = diamonds %>% filter(color %in% c("D","J"), cut %in% c("Good", "Premium")) %>% sample_n(20) ggplot(ds %>% mutate(ID = 1:n()) %>% # Add ID for each row mutate_if(is.numeric, scale) %>% # Scale numeric columns gather(key, value, c(1,5:10)), # Reshape to "long" format aes(key, value, group=ID, colour=color, fill=cut)) + geom_line() + geom_point(size=2, shape=21, colour="grey50") + scale_fill_manual(values=c("black","white")) 

enter image description here

I had not used ggparcoords , but the only option that seemed simple (at least from my first attempt with a function) was to insert two data columns. The following is an example. Even with four combinations of categories, the plot is confusing, but perhaps it will be interpreted if there are strong templates in your data:

 library(GGally) ds$group = with(ds, paste(cut, color, sep="-")) ggparcoord(ds, columns=c(1, 5:10), groupColumn=11) + theme(panel.grid.major.x=element_line(colour="grey70")) 

enter image description here

+1
source

Source: https://habr.com/ru/post/1268517/


All Articles