How to filter data in R?

I have huge datasets that contain over a million rows and have some specific attributes. I need to filter data while maintaining other properties.

My data is as follows:

      ID   Prop1   Prop2   TotalProp
56891940     G02     G02           2
56892558     A61     G02           4
56892558     A61     A61           4
56892558     G02     A61           4
56892558     A61     A61           4
56892552     B61     B61           3
56892552     B61     B61           3
56892552     B61     A61           3
56892559     B61     G61           3
56892559     B61     B61           3
56892559     B61     B61           3 and so on more than million rows

What I need, I need to delete lines if all line IDs have 56891940 and 56892559, which have β€œprop1” and β€œprop2” the same, but not 56892558 and 56892559, because some lines are the same, but at least one of its properties different therefore I want to keep all values ​​from 56892558,56892552 and 56892559 and so on.

My end result should look like this:

      ID   Prop1   Prop2   TotalProp
56892558     A61     G02           4
56892558     A61     A61           4
56892558     G02     A61           4
56892558     A61     A61           4
56892552     B61     B61           3
56892552     B61     B61           3
56892552     B61     A61           3    
56892559     B61     G61           3
56892559     B61     C61           3
56892559     B61     B61           3
+4
source share
1 answer

You can try

library(data.table)
setDT(df1)[, .SD[any(Prop1!=Prop2)], ID]
#          ID Prop1 Prop2 TotalProp
# 1: 56892558   A61   G02         4
# 2: 56892558   A61   A61         4
# 3: 56892558   G02   A61         4
# 4: 56892558   A61   A61         4
# 5: 56892552   B61   B61         3
# 6: 56892552   B61   B61         3
# 7: 56892552   B61   A61         3
# 8: 56892559   B61   G61         3
# 9: 56892559   B61   B61         3
#10: 56892559   B61   B61         3

Or as @Frank suggested

setDT(df1)[, if(any(Prop1!=Prop2)) .SD, ID]

dplyr

library(dplyr)
df1 %>%
    group_by(ID) %>%
    filter(any(Prop1!=Prop2))

ave base R

df1[with(df1, ave(Prop1!=Prop2, ID, FUN=any)),]
+4

Source: https://habr.com/ru/post/1589129/


All Articles