Complex data conversion to R

I have a data frame with 3 columns (excerpt below):

df <- data.frame( id = c(1,1,1,2,2,2), Year = c(2007, 2008, 2009, 2007, 2008, 2009), A = c(5, 2, 3, 7, 5, 6), B = c(10, 0, 50, 13, 17, 17) ) df 

I would like to have this:

 df_needed <- data.frame( id= c(1, 2), A_2007 = c(5, 7), B_2007 = c(10, 13), A_2008 = c(2, 5), B_2008 = c(0, 17), A_2009 = c(3, 6), B_2009 = c(50, 17) ) df_needed 

I am familiar with reshape and tidyR , but I do not think they can control this conversion.

Is there a proper way to do this, or do I need to do this using a special function?

Edit: this example has been edited to improve the example with more than 1 record in the final data set.

+6
source share
4 answers

Try

 library(dplyr) library(tidyr) gather(df, Var, Val, -Year) %>% unite(YearVar, Var, Year) %>% mutate(indx=1) %>% spread(YearVar, Val)%>% select(-indx) # A_2007 A_2008 A_2009 B_2007 B_2008 B_2009 #1 5 2 3 10 0 50 

Update

For editing you can change the variables in gather

 gather(df, Var, Val, A:B) %>% unite(YearVar, Var, Year) %>% spread(YearVar, Val) # id A_2007 A_2008 A_2009 B_2007 B_2008 B_2009 #1 1 5 2 3 10 0 50 #2 2 7 5 6 13 17 17 
+4
source

Good ol ' base::reshape works great here. Just create an id variable first.

 df$id <- 1 reshape(df, v.names = c("A", "B"), timevar = "Year", idvar = "id", direction = "wide") # id A.2007 B.2007 A.2008 B.2008 A.2009 B.2009 # 1 1 5 10 2 0 3 50 

To preserve some typing, given that you specify timevar and idvar , you do not need to provide v.names :

 reshape(df, timevar = "Year", idvar = "id", direction = "wide") 

This also works for edited data (which already had an id variable).

 # id A_2007 B_2007 A_2008 B_2008 A_2009 B_2009 # 1 1 5 10 2 0 3 50 # 2 2 7 13 5 17 6 17 

You can also use reshape2::recast :

 recast(df, id ~ variable + Year, id.var = 1:2) 
+4
source

Here's a possible solution using data.table v> = 1.9.5

 ## library(devtools) ## install_github("Rdatatable/data.table", build_vignettes = FALSE) library(data.table) dcast(setDT(df), . ~ Year, value.var = c("A", "B")) # . 2007_A 2008_A 2009_A 2007_B 2008_B 2009_B # 1: . 5 2 3 10 0 50 

Edit : on your new dataset just add id to the formula

 dcast(setDT(df), id ~ Year, value.var = c("A", "B")) # id 2007_A 2008_A 2009_A 2007_B 2008_B 2009_B # 1: 1 5 2 3 10 0 50 # 2: 2 7 5 6 13 17 17 
+3
source

Another simple option in base R :

 df_needed <- matrix(as.vector(t(df[, -1])), ncol=nrow(df)*(ncol(df)-1)) colnames(df_needed) <- paste(rep(colnames(df)[-1], nrow(df)), rep(df[, 1], e=ncol(df)-1), sep="_") df_needed # A_2007 B_2007 A_2008 B_2008 A_2009 B_2009 #[1,] 5 10 2 0 3 50 

with edited data

 df_split <- split(df, df$Year) df_split <- lapply(df_split, function(df){colnames(df)[-1] <- paste(colnames(df)[-1], unique(df$Year), sep="_"); df <- df[, -1]; return(df)}) df_needed <- do.call("cbind", df_split) colnames(df_needed) <- sub("^\\d{4}\\.","",colnames(df_needed)) df_needed # A_2007 B_2007 A_2008 B_2008 A_2009 B_2009 #1 5 10 2 0 3 50 #4 7 13 5 17 6 17 
+2
source

Source: https://habr.com/ru/post/987755/


All Articles