Check row if element exists in comma separated column with position

Question

Check row if element exists in comma separated column with position

Given this data.frame:

#    x     y
# 1  a b,c,d
# 2  c b,c,d
# 3  c b,c,d
# 4  a e,f,g
# 5  a b,c,d
# 6  c a,b,c
# 7  b b,c,d
# 8  c  <NA>
# 9  c e,f,g
# 10 a  <NA>

My desired result:

#    x     y pos contains
# 1  a b,c,d  NA    FALSE
# 2  c b,c,d   2     TRUE
# 3  c b,c,d   2     TRUE
# 4  a e,f,g  NA    FALSE
# 5  a b,c,d  NA    FALSE
# 6  c a,b,c   3     TRUE
# 7  b b,c,d   1     TRUE
# 8  c  <NA>  NA       NA
# 9  c e,f,g  NA    FALSE
# 10 a  <NA>  NA       NA

That is, check (by line) if it df$xis in df$yand indicate its position. I started the journey strsplit(df$y, ","), but things got complicated quickly, and I know that there is a simple solution.

Code to play:

set.seed(5)
seq_letters <- c("a,b,c", "b,c,d", "e,f,g", NA)
df <- data.frame(x = sample(letters[1:3], 10, TRUE),
                 y = sample(seq_letters, 10, TRUE),
                 stringsAsFactors = FALSE)

+4

r

Jasonaizkalns Nov 24 '15 at 12:49

source share

2 answers

You can also do this with a change in shape.

df_x_y =
  df %>%
  distinct %>%
  filter(y %>% is.na %>% `!` )   

df_y = 
  df_x_y %>%
  select(y) %>%
  distinct %>%
  mutate(y_split = y %>% stri_split_fixed(",") ) %>%
  unnest(y_split) %>%
  group_by(y) %>%
  mutate(pos = 1:n())

matches = 
  df_x_y %>%
  left_join(df_y) %>%
  filter(x == y_split)

df %>%
  left_join(matches)

+2

bramtayl Nov 24 '15 at 5:07

source share

Rich scriven · Accepted Answer · 2015-11-24T00:53:29+0000

Here you can use match()c mapply()to find the first column by dividing the column yinto pieces. Then we can build a second column based on this.

df$pos <- mapply(match, df$x, strsplit(df$y, ",", fixed = TRUE), USE.NAMES = FALSE)
df$contains <- replace(!is.na(df$pos), is.na(df$y), NA)

which gives

   x     y pos contains
1  a b,c,d  NA    FALSE
2  c b,c,d   2     TRUE
3  c b,c,d   2     TRUE
4  a e,f,g  NA    FALSE
5  a b,c,d  NA    FALSE
6  c a,b,c   3     TRUE
7  b b,c,d   1     TRUE
8  c  <NA>  NA       NA
9  c e,f,g  NA    FALSE
10 a  <NA>  NA       NA

Check row if element exists in comma separated column with position

More articles: