R: delete rows from data frame based on values in multiple columns

Question

R: delete rows from data frame based on values in multiple columns

I have the following dataframe (df) - more columns, but these are the corresponding columns:

I would like to multiply this data file in such a way that if any of the costs for a specific ID = $ 0, then it should delete all these lines (i.e. all lines for this specific identifier).

Therefore, in this example, IDs 2 and 5 contain $ 0, so all lines of ID 2 and ID 5 must be deleted.

Here is the df result I would like:

Can anyone help with this? I tried some combinations of the function subset, but that didn't work.

** On a similar note: I have another dataframe with "NA" s - could you help me deal with the same problem if it were NA, not 0.

!

+1

r

user4918087 09 . '15 16:44

3

Try

df[!df$ID %in% df$ID[df$Cost=="$0"],]

+3

C_Z_ 09 . '15 16:50

You can calculate the identifiers you want to remove using tapply:

(has.zero <- tapply(df$Cost, df$ID, function(x) sum(x == 0) > 0))
#     1     2     3     4     5 
# FALSE  TRUE FALSE FALSE  TRUE

Then you can multiply by restricting identifiers that you do not want to delete:

df[!df$ID %in% names(has.zero)[has.zero],]
#   ID Cost
# 1  1  100
# 2  1  200
# 6  3   10
# 7  4  100

This is quite flexible as it allows you to restrict identifiers based on more complex criteria (for example, "the average cost of an identifier should be at least xyz").

+1

josliber Jun 09 '15 at 16:53

source share

Aaron Katch · Accepted Answer · 2015-06-09T16:58:55+0000

:

subset(df,!df$ID %in% df$ID[is.na(df$Cost) | df$Cost == "$0"])

:

  ID Cost
1  1 $100
2  1 $200
6  3  $10
7  4 $100

R: delete rows from data frame based on values ​​in multiple columns

More articles:

R: delete rows from data frame based on values in multiple columns