Remove rows from data.table in R based on multiple column values

I have data.table in R that has several identifiers and a value. There are several lines for each combination of identifiers. If one of these rows contains NA in the column "value", I would like to delete all rows with this combination of identifiers. For example, in the table below, I would like to delete all rows for which id1 == 2 and id2 == 1 .

If I had only one id, I would do dat[!(id1 %in% dat[is.na(value),id1])] . In the example, this will delete all rows where i1 == 2. However, I was unable to include multiple columns.

 dat <- data.table(id1 = c(1,1,2,2,2,2), id2 = c(1,2,1,2,3,1), value = c(5,3,NA,6,7,3)) 
+6
source share
1 answer

If you want to check the combination of id1 and id2 , if any of the values ​​is NA , and then delete this whole combination, you can insert an if for each group and only get results (using .SD ) if this statement returns TRUE .

 dat[, if(!anyNA(value)) .SD, by = .(id1, id2)] # id1 id2 value # 1: 1 1 5 # 2: 1 2 3 # 3: 2 2 6 # 4: 2 3 7 

Or similarly,

 dat[, if(all(!is.na(value))) .SD, by = .(id1, id2)] 
+4
source

Source: https://habr.com/ru/post/981153/


All Articles