Separate data in columns defined by another column in the tyr

I collect data in which the required column name mapping is indicated in a separate column, for example:

df <- data.frame(splitme = c("6, 7, 8, 9", "1,2,3"), type = c("A, B, C, D", "A, C, D")) 

df looks like this:

  splitme type 6, 7, 8, 9 A, B, C, D 1,2,3 A, C, D 

The desired result should look like this:

 desired_output <- data.frame(A = c(6,1), B = c(7, NA), C = c(8,2), D = c(9,3)) 

ie:

  ABCD 6 7 8 9 1 NA 2 3 

If it were not for the fact that some lines have missing types, this would be a straightforward task for tidyr::separate .

 ## Not correctly aligned df %>% tidyr::separate(splitme, into = c("A", "B", "C", "D")) %>% select(-type) 

but it is clear that alignment creates problems. If only the into argument can take a column defining a split rule. Perhaps there is a strategy based on purr::pmap_df that can be used here?

+5
source share
2 answers

You can use separate_rows and then change with spread :

 library(dplyr); library(tidyr); df %>% # add a row identification number for reshaping purpose mutate(rn = row_number()) %>% separate_rows(splitme, type) %>% spread(type, splitme) %>% select(-rn) # ABCD #1 6 7 8 9 #2 1 <NA> 2 3 
+5
source

Using purrr:map2_dfr , instead of parsing the splitme column splitme we use the row directly in the data.frame call. We call the columns and map2_dfr bind the lines and handle the mising values.

 library(purrr) map2_dfr(df$splitme,df$type, ~setNames(eval(parse(text=paste0("data.frame(",.x,")"))), strsplit(.y,", ")[[1]])) # ABCD # 1 6 7 8 9 # 2 1 NA 2 3 
+1
source

Source: https://habr.com/ru/post/1276237/


All Articles