Remove rows from data.table in R based on multiple column values

Question

Remove rows from data.table in R based on multiple column values

I have data.table in R that has several identifiers and a value. There are several lines for each combination of identifiers. If one of these rows contains NA in the column "value", I would like to delete all rows with this combination of identifiers. For example, in the table below, I would like to delete all rows for which id1 == 2 and id2 == 1 .

If I had only one id, I would do dat[!(id1 %in% dat[is.na(value),id1])] . In the example, this will delete all rows where i1 == 2. However, I was unable to include multiple columns.

 dat <- data.table(id1 = c(1,1,2,2,2,2), id2 = c(1,2,1,2,3,1), value = c(5,3,NA,6,7,3))

+6

r data.table

lilaf Jan 17 '15 at 17:42

source share

1 answer

David Arenburg · Accepted Answer · 2015-01-17T17:55:05+0000

If you want to check the combination of id1 and id2 , if any of the values is NA , and then delete this whole combination, you can insert an if for each group and only get results (using .SD ) if this statement returns TRUE .

 dat[, if(!anyNA(value)) .SD, by = .(id1, id2)] # id1 id2 value # 1: 1 1 5 # 2: 1 2 3 # 3: 2 2 6 # 4: 2 3 7

Or similarly,

 dat[, if(all(!is.na(value))) .SD, by = .(id1, id2)]

Remove rows from data.table in R based on multiple column values

More articles: