Using dplyr to filter rows containing a partial column row

Assuming I have a data frame like

term     cnt
apple     10
apples     5
a apple on 3
blue pears 3
pears      1

How can I filter out all partial found rows in this column, for example. getting as a result

term     cnt
apple     10
pears      1

without specifying which terms I want to filter (apple | pears), but using the self-reference method (that is, it checks each expression for the entire column and removes terms that are a partial match). The number of tokens is not limited, and the consistency of the lines (for example, "kapples" will not match the "apple"). This will result in an inverted generic version

based on dplyr,
d[grep("^apple$|^pears$", d$term), ]

In addition, it would be interesting to use this deaeration to get the total, for example

term     cnt
apple     18
pears      4

I could not get it to work with contains () or grep ().

thank

+4
3

, . ( Pythonista call), - :

> ssss <- data.frame(c('apple','red apple','apples','pears','blue pears'),c(15,3,10,4,3))
> 
> names(ssss) <- c('Fruit','Count')
> 
> ssss
       Fruit Count
1      apple    15
2  red apple     3
3     apples    10
4      pears     4
5 blue pears     3
> 
> root_list <- as.vector(ssss$Fruit[unlist(lapply(ssss$Fruit,function(x){length(grep(x,ssss$Fruit))>1}))])
> 
> 
> ssss %>% filter(ssss$Fruit %in% root_list)
  Fruit Count
1 apple    15
2 pears     4
> 
> data <- data.frame(lapply(root_list, function(x){y <- stringr::str_extract(ssss$Fruit,x); ifelse(is.na(y),'',y)}))
> 
> cols <- colnames(data)
> 
> #data$x <- do.call(paste0, c(data[cols]))
> #for (co in cols) data[co] <- NULL
> 
> ssss$Fruit <- do.call(paste0, c(data[cols]))
> 
> ssss %>% group_by(Fruit) %>% summarise(val = sum(Count))
# A tibble: 2 x 2
  Fruit   val
  <chr> <dbl>
1 apple    28
2 pears     7
> 
+2

tidyverse -

1. define a list of the words as:

     k <- dft %>% 
          select(term) %>% 
          unlist() %>% 
          unique()

2. operate on the data as:

    dft %>%
      separate(term, c('t1', 't2')) %>%
      rowwise() %>%
      mutate( g = sum(t1 %in% k)) %>%
      filter( g > 0) %>%
      select(t1, cnt)

:

      t1   cnt
   <chr> <int>
1  apple    10
2 apples     5
3  pears     1

apple apples. .

0

Try the following:

df=data.frame(term=c('apple','apples','a apple on','blue pears','pears'),cnt=c(10,5,3,3,1))

matches = sapply(df$term,function(t,terms){grepl(pattern = t,x = terms)},df$term)

sapply(1:ncol(matches),function(t,mat){
  tempmat = mat[,t]&mat[,-t]
  indices=unlist(apply(tempmat,MARGIN = 2,which))
  df$term[indices]<<-df$term[t]
 },matches)

df%>%group_by(term)%>%summarize(cnt=sum(cnt))

 # A tibble: 2 x 2
 #  term   cnt
 #  <chr> <dbl>
 #1 apple    18
 #2 pears     4  
0
source

Source: https://habr.com/ru/post/1685743/


All Articles