Change a data frame with different column lengths to two columns that replicate the column id

I have the following data frame with different row lengths:

myvar <- as.data.frame(rbind(c("Walter","NA","NA","NA","NA"), c("Walter","NA","NA","NA","NA"), c("Walter","Jesse","NA","NA","NA"), c("Gus","Tuco","Mike","NA","NA"), c("Gus","Mike","Hank","Saul","Flynn"))) ID <- as.factor(c(1:5)) data.frame(ID,myvar) ID V1 V2 V3 V4 V5 1 Walter NA NA NA NA 2 Walter NA NA NA NA 3 Walter Jesse NA NA NA 4 Gus Tuco Mike NA NA 5 Gus Mike Hank Saul Flynn 

My goal is to switch this data frame into a two-column data frame. The first column will be the identifier, and the other the symbol name. Please note that the identifier must match the line in which the symbol was placed. I expect the following result:

 ID V 1 Walter 2 Walter 3 Walter 3 Jesse 4 Gus 4 Tuco 4 Mike 5 Gus 5 Mike 5 Hank 5 Saul 5 Flynn 

I tried dcast {reshape2}, but it does not return what I need. It is worth noting that my original data frame is quite large. Any tips? Greetings.

+6
source share
4 answers
 myvar <- as.data.frame(rbind(c("Walter","NA","NA","NA","NA"), c("Walter","NA","NA","NA","NA"), c("Walter","Jesse","NA","NA","NA"), c("Gus","Tuco","Mike","NA","NA"), c("Gus","Mike","Hank","Saul","Flynn"))) ID <- as.factor(c(1:5)) df <- data.frame(ID, myvar) 

Using basic adjustment. (I am converting your "NA" character strings to NA , which you may not have to do, this is only related to how you created this example)

 df[df == 'NA'] <- NA na.omit(reshape(df, direction = 'long', varying = list(2:6))[, c('ID','V1')]) # ID V1 # 1.1 1 Walter # 2.1 2 Walter # 3.1 3 Walter # 4.1 4 Gus # 5.1 5 Gus # 3.2 3 Jesse # 4.2 4 Tuco # 5.2 5 Mike # 4.3 4 Mike # 5.3 5 Hank # 5.4 5 Saul # 5.5 5 Flynn 

or using reshape2

 library('reshape2') ## na.omit(melt(df, id.vars = 'ID')[, c('ID','value')]) ## or better yet as ananda suggests: melt(df, id.vars = 'ID', na.rm = TRUE)[, c('ID','value')] # ID value # 1 1 Walter # 2 2 Walter # 3 3 Walter # 4 4 Gus # 5 5 Gus # 8 3 Jesse # 9 4 Tuco # 10 5 Mike # 14 4 Mike # 15 5 Hank # 20 5 Saul # 25 5 Flynn 

you receive warnings that the factor levels in the columns do not match, but that it’s fine.

+7
source

You can use unlist

  res <- subset(data.frame(ID,value=unlist(myvar[-1], use.names=FALSE)), value!='NA') res # ID value #1 1 Walter #2 2 Walter #3 3 Walter #4 4 Gus #5 5 Gus #6 3 Jesse #7 4 Tuco #8 5 Mike #9 4 Mike #10 5 Hank #11 5 Saul #12 5 Flynn 

NOTE: NAs elements are symbols in a dataset, it is better to create it without quotes so that it is a real NA, and we can delete it with na.omit , is.na , complete.cases , etc.

data

 myvar <- data.frame(ID,myvar) 
+7
source

Correct "NA" so that they are actually NA :

 mydf[mydf == "NA"] <- NA 

Using some subset to do it all in one fell swoop:

 data.frame(ID=mydf$ID[row(mydf[-1])[!is.na(mydf[-1])]], V=mydf[-1][!is.na(mydf[-1])]) # ID V #1 1 Walter #2 2 Walter #3 3 Walter #4 4 Gus #5 5 Gus #6 3 Jesse #7 4 Tuco #8 5 Mike #9 4 Mike #10 5 Hank #11 5 Saul #12 5 Flynn 

Or much more readable in the R base:

 sel <- which(!is.na(mydf[-1]), arr.ind=TRUE) data.frame(ID=mydf$ID[sel[,1]], V=mydf[-1][sel]) 
+6
source

Using tidyr

 library("tidyr") myvar <- as.data.frame(rbind(c("Walter","NA","NA","NA","NA"), c("Walter","NA","NA","NA","NA"), c("Walter","Jesse","NA","NA","NA"), c("Gus","Tuco","Mike","NA","NA"), c("Gus","Mike","Hank","Saul","Flynn"))) ID <- as.factor(c(1:5)) myvar <- data.frame(ID,myvar) myvar %>% gather(ID, Name, V1:V5 ) %>% select(ID, value) %>% filter(value != "NA") 

If your Ns are encoded as NA instead of "NA" , then we can use the na.rm = TRUE option in gather . For instance:.

 myvar[myvar == "NA"] <- NA myvar %>% gather(ID, Name, V1:V5, na.rm = TRUE ) %>% select(ID, value) 

gives

  ID value 1 1 Walter 2 2 Walter 3 3 Walter 4 4 Gus 5 5 Gus 6 3 Jesse 7 4 Tuco 8 5 Mike 9 4 Mike 10 5 Hank 11 5 Saul 12 5 Flynn 
+5
source

Source: https://habr.com/ru/post/984767/


All Articles