Using dplyr to filter rows containing a partial column row

Question

Using dplyr to filter rows containing a partial column row

Assuming I have a data frame like

term     cnt
apple     10
apples     5
a apple on 3
blue pears 3
pears      1

How can I filter out all partial found rows in this column, for example. getting as a result

term     cnt
apple     10
pears      1

without specifying which terms I want to filter (apple | pears), but using the self-reference method (that is, it checks each expression for the entire column and removes terms that are a partial match). The number of tokens is not limited, and the consistency of the lines (for example, "kapples" will not match the "apple"). This will result in an inverted generic version

based on dplyr,

d[grep("^apple$|^pears$", d$term), ]

In addition, it would be interesting to use this deaeration to get the total, for example

term     cnt
apple     18
pears      4

I could not get it to work with contains () or grep ().

thank

+4

filter r dplyr mutate

Karsten Sender 15 . '17 12:10

3

tidyverse -

1. define a list of the words as:

     k <- dft %>% 
          select(term) %>% 
          unlist() %>% 
          unique()

2. operate on the data as:

    dft %>%
      separate(term, c('t1', 't2')) %>%
      rowwise() %>%
      mutate( g = sum(t1 %in% k)) %>%
      filter( g > 0) %>%
      select(t1, cnt)

:

      t1   cnt
   <chr> <int>
1  apple    10
2 apples     5
3  pears     1

apple apples. .

0

Aramis7d 15 . '17 12:49

Try the following:

df=data.frame(term=c('apple','apples','a apple on','blue pears','pears'),cnt=c(10,5,3,3,1))

matches = sapply(df$term,function(t,terms){grepl(pattern = t,x = terms)},df$term)

sapply(1:ncol(matches),function(t,mat){
  tempmat = mat[,t]&mat[,-t]
  indices=unlist(apply(tempmat,MARGIN = 2,which))
  df$term[indices]<<-df$term[t]
 },matches)

df%>%group_by(term)%>%summarize(cnt=sum(cnt))

 # A tibble: 2 x 2
 #  term   cnt
 #  <chr> <dbl>
 #1 apple    18
 #2 pears     4

0

TUSHAr 15 sept. '17 at 14:24

source share

amrrs · Accepted Answer · 2017-09-15T12:56:18+0000

, . ( Pythonista call), - :

> ssss <- data.frame(c('apple','red apple','apples','pears','blue pears'),c(15,3,10,4,3))
> 
> names(ssss) <- c('Fruit','Count')
> 
> ssss
       Fruit Count
1      apple    15
2  red apple     3
3     apples    10
4      pears     4
5 blue pears     3
> 
> root_list <- as.vector(ssss$Fruit[unlist(lapply(ssss$Fruit,function(x){length(grep(x,ssss$Fruit))>1}))])
> 
> 
> ssss %>% filter(ssss$Fruit %in% root_list)
  Fruit Count
1 apple    15
2 pears     4
> 
> data <- data.frame(lapply(root_list, function(x){y <- stringr::str_extract(ssss$Fruit,x); ifelse(is.na(y),'',y)}))
> 
> cols <- colnames(data)
> 
> #data$x <- do.call(paste0, c(data[cols]))
> #for (co in cols) data[co] <- NULL
> 
> ssss$Fruit <- do.call(paste0, c(data[cols]))
> 
> ssss %>% group_by(Fruit) %>% summarise(val = sum(Count))
# A tibble: 2 x 2
  Fruit   val
  <chr> <dbl>
1 apple    28
2 pears     7
>

Using dplyr to filter rows containing a partial column row

More articles: