Unlist to create a unique row in a dataframe

I ran into the following transformation problem R. I have the following data framework:

 test_df <-  structure(list(word = c("list of XYZ schools", 
"list of basketball", "list of usa"), results = c("58", "151", "29"), key_list = structure(list(`coRq,coG,coQ,co7E,coV98` = c("coRq", "coG", "coQ", "co7E", "coV98"), `coV98,coUD,coHF,cobK,con7` = c("coV98","coUD", "coHF", "cobK", "con7"), `coV98,coX7,couC,coD3,copW` = c("coV98", "coX7", "couC", "coD3", "copW")), .Names = c("coRq,coG,coQ,co7E,coV98", "coV98,coUD,coHF,cobK,con7", "coV98,coX7,couC,coD3,copW"))), .Names = c("word", "results", "key_list"), row.names = c(116L, 150L, 277L), class = "data.frame")

In short, there are three columns that are unique by word, and then the corresponding "key_list", which has a comma separated list of keys. I am interested in creating a new data frame where each key is unique and the word information is duplicated, as well as information about the results. So, the data frame is as follows:

key          word                    results                    
coV98       "list of XYZ schools"    58
coRq        "list of XYZ schools"    58
coV98       "list of basketball"     151
coV98       "list of usa"            29

And so on for all the keys, so I would like to expand the keys to list them, and then reformat them into a framework with repeating words and other columns.

I tried a bunch of the following: I created a unique list of keys, and then tried to grep each of these keys in a column and skip it to create a new smaller data frame and then merge them together, however the resulting framework does not contain a key column:

keys <- as.data.frame(table(unname(unlist(test_df$key_list))))
ttt <- lapply(keys, function(xx){
      idx <- grep(xx, test_df$key_list)
      df <- all_data_sub[idx,]})
      final_df <- do.call(rbind, ttt)

I also played with unlisting and rebuilding, but I am not getting the right combination. Any advice would be helpful! thanks

+4
source share
2 answers

Maybe we can use listCol_lfromsplitstackshape

library(splitstackshape)
listCol_l(test_df, 'key_list')[]
+4
source

In case the basic solution R is useful for someone:

do.call(rbind, lapply(seq_along(test_df$key_list), function(i) {
    merge(test_df$key_list[[i]], test_df[i,-3], by=NULL)
  }))
+3
source

Source: https://habr.com/ru/post/1625181/


All Articles