Convert the numeric representation of the variable column to the original row after melting using patterns

I use the patterns() argument in data.table::melt() to melt data that has columns with several easily defined patterns. It works, but I don’t see how I can create a character index variable instead of the default numeric decomposition.

For example, the columns for dogs and cats are numbered ... look at the "variable" column:

 A = data.table(idcol = c(1:5), dog_1 = c(1:5), cat_1 = c(101:105), dog_2 = c(6:10), cat_2 = c(106:110), dog_3 = c(11:15), cat_3 = c(111:115)) head(melt(A, measure = patterns("^dog", "^cat"), value.name = c("dog", "cat"))) idcol variable dog cat 1: 1 1 1 101 2: 2 1 2 102 3: 3 1 3 103 4: 4 1 4 104 5: 5 1 5 105 6: 1 2 6 106 

However, in B, the dog and cat columns are numbered with text, but the variable column is still numeric.

 B = data.table(idcol = c(1:5), dog_one = c(1:5), cat_one = c(101:105), dog_two = c(6:10), cat_two = c(106:110), dog_three = c(11:15), cat_three = c(111:115)) head(melt(B, measure = patterns("^dog", "^cat"), value.name = c("dog", "cat"))) idcol variable dog cat 1: 1 1 1 101 2: 2 1 2 102 3: 3 1 3 103 4: 4 1 4 104 5: 5 1 5 105 6: 1 2 6 106 

How can I fill the variable column with one / two / three instead of 1/2/3?

+6
source share
1 answer

There may be simpler ways, but this seems to work:

 # grab suffixes of 'variable' names suff <- unique(sub('^.*_', '', names(B[ , -1]))) # suff <- unique(tstrsplit(names(B[, -1]), "_")[[2]]) # melt B2 <- melt(B, measure = patterns("^dog", "^cat"), value.name = c("dog", "cat")) # replace factor levels in 'variable' with the suffixes setattr(B2$variable, "levels", suff) B2 # idcol variable dog cat # 1: 1 one 1 101 # 2: 2 one 2 102 # 3: 3 one 3 103 # 4: 4 one 4 104 # 5: 5 one 5 105 # 6: 1 two 6 106 # 7: 2 two 7 107 # 8: 3 two 8 108 # 9: 4 two 9 109 # 10: 5 two 10 110 # 11: 1 three 11 111 # 12: 2 three 12 112 # 13: 3 three 13 113 # 14: 4 three 14 114 # 15: 5 three 15 115 

Note that there is an open problem in this section with some other alternatives: FR: expanding melt functionality to handle output names .


This is one of the (rare) cases where I find that good'ol base::reshape cleaner. The sep argument is useful here - both the column names "value" and the column levels "variable" are generated in one pass:

 reshape(data = B, varying = names(B[ , -1]), sep = "_", direction = "long") 
+7
source

Source: https://habr.com/ru/post/1014476/


All Articles