Deleting rows in R conditionally

Question

Deleting rows in R conditionally

I have data where the first column is a set of identification numbers (some are repeated) and the second column is just a bunch of numbers. I need a way to store each ID only once based on the smallest number in the second column.

Row# ID Number 1 10 180 2 12 167 3 12 182 4 12 135 5 15 152 6 15 133

Example: I only want to save lines # 1, 4 and 6 and delete the rest

+6

r conditional delete-row

Swanny Jun 15 '15 at 17:44

source share

2 answers

or

 do.call(rbind, lapply(split(df1, df1$ID), function(x) subset(x, Number == min(Number)))) #ID Number #10 10 180 #12 12 135 #15 15 133

+2

Veerendra gadekar Jun 15 '15 at 18:13

source share

akrun · Accepted Answer · 2015-06-15T17:45:56+0000

To select a row with the minimum number "Number" for each group of "ID" we can use one of the functions of aggregation by groups. A base R - aggregate . Using aggregate we can either use the "formula" method or specify a list groupings of elements / variables with the argument by . Using the formula method, we get the value of min "Number" for each "ID".

 aggregate(Number~ID, df1, FUN=min)

Or we can use a faster option with data.table . Here we convert "data.frame" to "data.table" ( setDT(df1) ), grouped by "ID", we get the value min "Number".

 library(data.table) setDT(df1)[, list(Number=min(Number)), by = ID]

Or you can also do this with setorder up to the order column "Number" and use unique with the by parameter to select the first row not duplicated by "ID". (from comments by @David Arenburgs)

  unique(setorder(setDT(df1), Number), by = "ID")

Or using dplyr , we group by 'ID' and get the subset rows with summarise .

 library(dplyr) df1 %>% group_by(ID) %>% summarise(Number= min(Number))

Or we can use the sqldf syntax to get a subset of the data.

 library(sqldf) sqldf('select ID, min(Number) as Number from df1 group by ID')

Update

If there are multiple columns, and you want to get a row based on the minimum Number for each ID, you can use which.min . Using .I will get the row index and can be used for a subset of the rows.

 setDT(df1)[df1[, .I[which.min(Number)], by = ID]$V1]

Or with dplyr we use slice to filter strings that have a min value for "Number" for each "ID"

 df1 %>% group_by(ID) %>% slice(which.min(Number))

Deleting rows in R conditionally

Update

More articles: