R: processing and building grouped data

This is the next question on this: R: plot multiple lines in one plot

There I used part of my data for column 1 of a graph with several lines. Now I want to draw several graphs in one grid, since I have grouped the data. Right now I am doing this with creating data for each data group, and then creating a graph for each data block and combining them using gridd.arrange() However, I am wondering if I can process grouped data as 1 data set instead of creating all these separate tables?

The data that I have is structured as follows:

  Category1 Category2 Category3 Company 2011 2013 2011 2013 2011 2013 Company1 300 350 290 300 295 290 Company2 320 430 305 301 300 400 Company3 310 420 400 305 400 410 

So, is there a way to handle this right away and build 3 graphs (for each category) with lines for each company for the year (2011 and 2013)?

+4
source share
2 answers

You must learn to structure your data and how to make a reproducible example . It is very difficult to process data in such an unstructured format. Not only for you, but also for us.

 mdf <- read.table( text="Company 2011 2013 2011 2013 2011 2013 Company1 300 350 290 300 295 290 Company2 320 430 305 301 300 400 Company3 310 420 400 305 400 410", header = TRUE, check.names=FALSE ) library("reshape2") cat1 <- melt(mdf[c(1,2,3)], id.vars="Company", value.name="value", variable.name="Year") cat1$Category <- "Category1" cat2 <- melt(mdf[c(1,4,5)], id.vars="Company", value.name="value", variable.name="Year") cat2$Category <- "Category2" cat3 <- melt(mdf[c(1,6,7)], id.vars="Company", value.name="value", variable.name="Year") cat3$Category <- "Category3" mdf <- rbind(cat1, cat2, cat3) head(mdf) Company Year value Category 1 Company1 2011 300 Category1 2 Company2 2011 320 Category1 3 Company3 2011 310 Category1 4 Company1 2013 350 Category1 5 Company2 2013 430 Category1 6 Company3 2013 420 Category1 

This can be automated, of course, if the number of categories is very large:

 library( "plyr" ) mdf <- adply( c(1:3), 1, function( cat ){ tmp <- melt(mdf[ c(1, cat*2, cat*2+1) ], id.vars="Company", value.name="value", variable.name="Year") tmp$Category <- paste0("Category", cat) return(tmp) } ) 

But if you cannot push all this data back and forth from the very beginning, you must do it.

Using faces

ggplot2 has built-in support for faceted graphs displaying data of the same type, if they can be a subset of one (or more) variables. See ? facet_wrap ? facet_wrap or ? facet_grid ? facet_grid .

 ggplot(data=mdf, aes(x=Year, y=value, group = Company, colour = Company)) + geom_line() + geom_point( size=4, shape=21, fill="white") + facet_wrap( "Category" ) 

enter image description here

Getting individual schedules

Alternatively, you can multiply your data.frame with the appropriate variable and save the individual graphs in a list:

 librayr("plyr") ll <- dlply( mdf, "Category", function(x){ ggplot(data=x, aes(x=Year, y=value, group = Company, colour = Company)) + geom_line() + geom_point( size=4, shape=21, fill="white") }) ll[["Category1"]] 
+5
source

At least for ggplot2 you will want to use the reshape2 package to convert your data to a slightly different format.

Suppose you have data.frame as follows:

 test <- structure(list(Company = structure(1:3, .Label = c("Company1", "Company2", "Company3"), class = "factor"), X2011.1 = c(300L, 320L, 310L), X2013.1 = c(350L, 430L, 420L), X2011.2 = c(290, 305, 400), X2013.2 = c(300, 301, 305), X2011.3 = c(295, 300, 400), X2013.3 = c(290L, 400L, 410L)), .Names = c("Company", "X2011.1", "X2013.1", "X2011.2", "X2013.2", "X2011.3", "X2013.3"), class = "data.frame", row.names = c(NA, -3L)) 

Ignore the ugliness for now, it looks like this:

  Company X2011.1 X2013.1 X2011.2 X2013.2 X2011.3 X2013.3 Company1 300 350 290 300 295 290 Company2 320 430 305 301 300 400 Company3 310 420 400 305 400 410 

If we use the melt() function, we can do it as follows:

 melt(test) -> test.melt test.melt Using Company as id variables Company variable value 1 Company1 X2011.1 300 2 Company2 X2011.1 320 3 Company3 X2011.1 310 4 Company1 X2013.1 350 5 Company2 X2013.1 430 6 Company3 X2013.1 420 7 Company1 X2011.2 290 8 Company2 X2011.2 305 

Then use company or variable as a grouping factor for ggplot2. Obviously, you'll want to name them more wisely. :)

eg. you could do

 ggplot(melt(test)) + geom_bar(aes(x = Company, y = value, fill = variable), stat = "identity", position = "dodge") 

Or something.

0
source

Source: https://habr.com/ru/post/1486650/


All Articles