How to match two column elements separated by comma

I have a data frame with the name of mydfwhere in the columns a, and bthere are items separated by a comma. What I want to do is combine the values ​​in the columns aand bby removing (or ignoring) the values ​​in parentheses ()and getting the column commonin result.

mydf
    a                           b                 
1   at1 (1) , 23-x (0)             at1,23-x,gt 
2   hh (2) , pp (0)             pp
3   cg (4) , gh (9) , th (7)    th,cg


result
    a                           b                 common
1   at1 (1) , 23-x (0)             at1,23-x,gt          at1,23-x
2   hh (2) , pp (0)             pp                pp    
3   cg (4) , gh (9) , th (7)    rh,cg             cg             

Data:

mydf <- read.table(
  text = "a|b                 
    at1 (1) , 23-x (0)|at1,23-x,gt
    hh (2) , pp (0)|pp
    cg (4) , gh (9) , th (7)|th,cg",
  sep = "|", header = TRUE,
  colClasses = rep("character", 2)
)
+4
source share
1 answer

(lapply(mydf, ..), str_extract . intersect list Map, toString, unlist vector "" .

library(stringr)
lst <- lapply(mydf, function(x)str_extract_all(x, '\\b[a-z]+\\b'))
mydf$common <- unlist(Map(function(x,y) toString(intersect(x,y)),
                            lst[[1]], lst[[2]]))
+4

Source: https://habr.com/ru/post/1622925/


All Articles