Put one numeric variable in n numeric variables in n graphs

I have a huge data frame, and I would like to make some graphs to get an idea of ​​the associations between the various variables. I can not use

pairs(data) 

because it will give me more than 400 stories. However, there is one answer variable that I am particularly interested in. Thus, I would like to build y against all variables, which would reduce the number of graphs from n ^ 2 to n. Can you show me how to do this? Thanks

EDIT: I am adding an example for clarity. Say I have a dataframe

 foo=data.frame(x1=1:10,x2=seq(0.1,1,0.1),x3=-7:2,x4=runif(10,0,1)) 

and my response variable is x3. Then I would like to create four graphs located in a row, respectively, x1 vs x3, x2 vs x3, histogram x3 and finally x4 vs x3. I know how to make every story

 plot(foo$x1,foo$x3) plot(foo$x2,foo$x3) hist(foo$x3) plot(foo$x4,foo$x3) 

However, I have no idea how to organize them in a row. Also, it would be great if there was a way to automatically make all n graphs without having to call up a command graph (or histogram) every time. When n = 4, this is not so important, but I usually deal with n = 20 + variables, so it can be drag and drop.

+6
source share
4 answers

Failed to execute reshape2 / ggplot2 / gridExtra package combination. Thus, you do not need to indicate the number of graphs. This code will work with any number of explanatory variables without any changes.

 foo <- data.frame(x1=1:10,x2=seq(0.1,1,0.1),x3=-7:2,x4=runif(10,0,1)) library(reshape2) foo2 <- melt(foo, "x3") library(ggplot2) p1 <- ggplot(foo2, aes(value, x3)) + geom_point() + facet_grid(.~variable) p2 <- ggplot(foo, aes(x = x3)) + geom_histogram() library(gridExtra) grid.arrange(p1, p2, ncol=2) 

enter image description here

+5
source

The tidyr package helps to do this efficiently. here for more options

 data %>% gather(-y_value, key = "some_var_name", value = "some_value_name") %>% ggplot(aes(x = some_value_name, y = y_value)) + geom_point() + facet_wrap(~ some_var_name, scales = "free") 

you will get something like this

enter image description here

+3
source

I had the same problem and I have no experience with ggplot2 , so I created a function using plot that takes a data frame and variables that will be displayed as arguments and generate graphs.

 dfplot <- function(data.frame, xvar, yvars=NULL) { df <- data.frame if (is.null(yvars)) { yvars = names(data.frame[which(names(data.frame)!=xvar)]) } if (length(yvars) > 25) { print("Warning: number of variables to be plotted exceeds 25, only first 25 will be plotted") yvars = yvars[1:25] } #choose a format to display charts ncharts <- length(yvars) nrows = ceiling(sqrt(ncharts)) ncols = ceiling(ncharts/nrows) par(mfrow = c(nrows,ncols)) for(i in 1:ncharts){ plot(df[,xvar],df[,yvars[i]],main=yvars[i], xlab = xvar, ylab = "") } } 

Notes:

  • You can provide a list of variables that will be displayed as yvars , otherwise it will display all (or the first 25, whichever is smaller) variables in the xvar data frame.
  • Fields were taken out of bounds if the number of charts exceeds 25, so I kept the limit for only 25 charts. Any suggestions are appreciated.
  • In addition, the y-axis labels are removed as graph headings look after this. The x axis sign is set to xvar .
0
source

Using the pairs function, you can also specify a formula for constructing the various variables that you want to see, instead of using the entire data set.

I tried to reproduce an example in your question.

So here is my MWE:

 foo=data.frame(x1=1:10,x2=seq(0.1,1,0.1),x3=-7:2,x4=runif(10,0,1)) pairs(foo$x3 ~ foo$x1 + foo$x2 + foo$x4) 

In the formula, I indicated that I need to build the answer (foo $ x3), against the variables x1, x2 and x4.

And here is the result:

enter image description here

Hope this helps you.

-1
source

Source: https://habr.com/ru/post/971957/


All Articles