Change a data frame with different column lengths to two columns that replicate the column id

Question

Change a data frame with different column lengths to two columns that replicate the column id

I have the following data frame with different row lengths:

myvar <- as.data.frame(rbind(c("Walter","NA","NA","NA","NA"), c("Walter","NA","NA","NA","NA"), c("Walter","Jesse","NA","NA","NA"), c("Gus","Tuco","Mike","NA","NA"), c("Gus","Mike","Hank","Saul","Flynn"))) ID <- as.factor(c(1:5)) data.frame(ID,myvar) ID V1 V2 V3 V4 V5 1 Walter NA NA NA NA 2 Walter NA NA NA NA 3 Walter Jesse NA NA NA 4 Gus Tuco Mike NA NA 5 Gus Mike Hank Saul Flynn

My goal is to switch this data frame into a two-column data frame. The first column will be the identifier, and the other the symbol name. Please note that the identifier must match the line in which the symbol was placed. I expect the following result:

 ID V 1 Walter 2 Walter 3 Walter 3 Jesse 4 Gus 4 Tuco 4 Mike 5 Gus 5 Mike 5 Hank 5 Saul 5 Flynn

I tried dcast {reshape2}, but it does not return what I need. It is worth noting that my original data frame is quite large. Any tips? Greetings.

+6

r multiple-columns reshape

ALS.Meyer Apr 7 '15 at 3:29

source share

4 answers

rawr · Answer 1 · 2015-04-07T03:45:43+0000

 myvar <- as.data.frame(rbind(c("Walter","NA","NA","NA","NA"), c("Walter","NA","NA","NA","NA"), c("Walter","Jesse","NA","NA","NA"), c("Gus","Tuco","Mike","NA","NA"), c("Gus","Mike","Hank","Saul","Flynn"))) ID <- as.factor(c(1:5)) df <- data.frame(ID, myvar)

Using basic adjustment. (I am converting your "NA" character strings to NA , which you may not have to do, this is only related to how you created this example)

 df[df == 'NA'] <- NA na.omit(reshape(df, direction = 'long', varying = list(2:6))[, c('ID','V1')]) # ID V1 # 1.1 1 Walter # 2.1 2 Walter # 3.1 3 Walter # 4.1 4 Gus # 5.1 5 Gus # 3.2 3 Jesse # 4.2 4 Tuco # 5.2 5 Mike # 4.3 4 Mike # 5.3 5 Hank # 5.4 5 Saul # 5.5 5 Flynn

or using reshape2

 library('reshape2') ## na.omit(melt(df, id.vars = 'ID')[, c('ID','value')]) ## or better yet as ananda suggests: melt(df, id.vars = 'ID', na.rm = TRUE)[, c('ID','value')] # ID value # 1 1 Walter # 2 2 Walter # 3 3 Walter # 4 4 Gus # 5 5 Gus # 8 3 Jesse # 9 4 Tuco # 10 5 Mike # 14 4 Mike # 15 5 Hank # 20 5 Saul # 25 5 Flynn

you receive warnings that the factor levels in the columns do not match, but that it’s fine.

akrun · Answer 2 · 2015-04-07T03:54:05+0000

You can use unlist

  res <- subset(data.frame(ID,value=unlist(myvar[-1], use.names=FALSE)), value!='NA') res # ID value #1 1 Walter #2 2 Walter #3 3 Walter #4 4 Gus #5 5 Gus #6 3 Jesse #7 4 Tuco #8 5 Mike #9 4 Mike #10 5 Hank #11 5 Saul #12 5 Flynn

NOTE: NAs elements are symbols in a dataset, it is better to create it without quotes so that it is a real NA, and we can delete it with na.omit , is.na , complete.cases , etc.

data

 myvar <- data.frame(ID,myvar)

thelatemail · Answer 3 · 2015-04-07T03:54:05+0000

Correct "NA" so that they are actually NA :

 mydf[mydf == "NA"] <- NA

Using some subset to do it all in one fell swoop:

 data.frame(ID=mydf$ID[row(mydf[-1])[!is.na(mydf[-1])]], V=mydf[-1][!is.na(mydf[-1])]) # ID V #1 1 Walter #2 2 Walter #3 3 Walter #4 4 Gus #5 5 Gus #6 3 Jesse #7 4 Tuco #8 5 Mike #9 4 Mike #10 5 Hank #11 5 Saul #12 5 Flynn

Or much more readable in the R base:

 sel <- which(!is.na(mydf[-1]), arr.ind=TRUE) data.frame(ID=mydf$ID[sel[,1]], V=mydf[-1][sel])

Alex · Answer 4 · 2015-04-07T03:49:07+0000

Using tidyr

 library("tidyr") myvar <- as.data.frame(rbind(c("Walter","NA","NA","NA","NA"), c("Walter","NA","NA","NA","NA"), c("Walter","Jesse","NA","NA","NA"), c("Gus","Tuco","Mike","NA","NA"), c("Gus","Mike","Hank","Saul","Flynn"))) ID <- as.factor(c(1:5)) myvar <- data.frame(ID,myvar) myvar %>% gather(ID, Name, V1:V5 ) %>% select(ID, value) %>% filter(value != "NA")

If your Ns are encoded as NA instead of "NA" , then we can use the na.rm = TRUE option in gather . For instance:.

 myvar[myvar == "NA"] <- NA myvar %>% gather(ID, Name, V1:V5, na.rm = TRUE ) %>% select(ID, value)

gives

  ID value 1 1 Walter 2 2 Walter 3 3 Walter 4 4 Gus 5 5 Gus 6 3 Jesse 7 4 Tuco 8 5 Mike 9 4 Mike 10 5 Hank 11 5 Saul 12 5 Flynn

Change a data frame with different column lengths to two columns that replicate the column id

data

More articles: