Collapse rows in data.table

I have one data table with 1 M rows and 2 columns

Dummy data:

require(data.table) ID <- c(1,2,3) variable <- c("a,b","a,c","c,d") dt <- data.table(ID,variable) dt > dt 
  Id variable
 1 a, b
 2 a, c
 3 c, d 

Now I want to collapse the column "variable" in different rows using "ID", just like the function "melt" in the file reshape2 or melt.data.table in data.table

Here is what I want:

  Id variable
 1 a
 1 b
 2 a
 2 c
 3 c
 3 d 

PS: Given the desired results, I know how to perform the reverse step.

 dt2 <- data.table(ID = c(1,1,2,2,3,3), variable = c("a","b","a","c","c","d")) dt3 <- dt2[, list(variables = paste(variable, collapse = ",")), by = ID] 

Any tips or suggestions?

+6
source share
1 answer

Since strsplit is vectorized, and it will be a laborious operation here, I would not use it in every group. Instead, you can first split into , into the entire column, and then restore data.table as follows:

 var = strsplit(dt$variable, ",", fixed=TRUE) len = vapply(var, length, 0L) ans = data.table(ID=rep(dt$ID, len), variable=unlist(var)) # ID variable # 1: 1 a # 2: 1 b # 3: 2 a # 4: 2 c # 5: 3 c # 6: 3 d 
+5
source

Source: https://habr.com/ru/post/970309/


All Articles