Filter groups by value entry

How to select groups based on a condition for individual rows, say, filter all groups that contain a value of 4 (or any other condition).

Let's take very simple data with two groups, and I want to select group B (as it matters 4)

library(dplyr) df <- data.frame(Group=LETTERS[c(1,1,1,2,2,2)], Value=c(1:5,4)) > df Group Value 1 A 1 2 A 2 3 B 3 4 B 4 

Executing group_by() and then filter (as in this post ) will select only individual rows containing the value 4, and not the entire group:

 df %>% group_by(Group) %>% filter(Value==4) Group Value <fctr> <int> 1 B 4 
+5
source share
2 answers

This turns out to be pretty simple: you just need to use the any() function in the filter call. Indeed, it turns out that:

  • filter(any(...)) is evaluated at the level of group_by() ,

  • filter(...) is evaluated at the rowwise() level, even if preceded by group_by() .

Therefore, use:

  df %>% group_by(Group) %>% filter(any(Value==4)) Group Value <fctr> <int> 1 B 3 2 B 4 

Interestingly, the same thing happens with a mutation, compare:

 df %>% group_by(Group) %>% mutate(check1=any(Value==4), check2=Value==4) Group Value check1 check2 <fctr> <int> <lgl> <lgl> 1 A 1 FALSE FALSE 2 A 2 FALSE FALSE 3 B 3 TRUE FALSE 4 B 4 TRUE TRUE 
+7
source

A data.table option

 library(data.table) setDT(df)[, if(any(Value==4)) .SD, by = Group] # Group Value #1: B 4 #2: B 5 #3: B 4 
0
source

Source: https://habr.com/ru/post/1260400/


All Articles