Check row if element exists in comma separated column with position

Given this data.frame:

#    x     y
# 1  a b,c,d
# 2  c b,c,d
# 3  c b,c,d
# 4  a e,f,g
# 5  a b,c,d
# 6  c a,b,c
# 7  b b,c,d
# 8  c  <NA>
# 9  c e,f,g
# 10 a  <NA>

My desired result:

#    x     y pos contains
# 1  a b,c,d  NA    FALSE
# 2  c b,c,d   2     TRUE
# 3  c b,c,d   2     TRUE
# 4  a e,f,g  NA    FALSE
# 5  a b,c,d  NA    FALSE
# 6  c a,b,c   3     TRUE
# 7  b b,c,d   1     TRUE
# 8  c  <NA>  NA       NA
# 9  c e,f,g  NA    FALSE
# 10 a  <NA>  NA       NA

That is, check (by line) if it df$xis in df$yand indicate its position. I started the journey strsplit(df$y, ","), but things got complicated quickly, and I know that there is a simple solution.


Code to play:
set.seed(5)
seq_letters <- c("a,b,c", "b,c,d", "e,f,g", NA)
df <- data.frame(x = sample(letters[1:3], 10, TRUE),
                 y = sample(seq_letters, 10, TRUE),
                 stringsAsFactors = FALSE)

+4
source share
2 answers

Here you can use match()c mapply()to find the first column by dividing the column yinto pieces. Then we can build a second column based on this.

df$pos <- mapply(match, df$x, strsplit(df$y, ",", fixed = TRUE), USE.NAMES = FALSE)
df$contains <- replace(!is.na(df$pos), is.na(df$y), NA)

which gives

   x     y pos contains
1  a b,c,d  NA    FALSE
2  c b,c,d   2     TRUE
3  c b,c,d   2     TRUE
4  a e,f,g  NA    FALSE
5  a b,c,d  NA    FALSE
6  c a,b,c   3     TRUE
7  b b,c,d   1     TRUE
8  c  <NA>  NA       NA
9  c e,f,g  NA    FALSE
10 a  <NA>  NA       NA
+6
source

You can also do this with a change in shape.

df_x_y =
  df %>%
  distinct %>%
  filter(y %>% is.na %>% `!` )   

df_y = 
  df_x_y %>%
  select(y) %>%
  distinct %>%
  mutate(y_split = y %>% stri_split_fixed(",") ) %>%
  unnest(y_split) %>%
  group_by(y) %>%
  mutate(pos = 1:n())

matches = 
  df_x_y %>%
  left_join(df_y) %>%
  filter(x == y_split)

df %>%
  left_join(matches)
+2
source

Source: https://habr.com/ru/post/1617155/


All Articles