How to efficiently select rows with a minimum value in R?

Possible duplicate:
Maintain a minimum value for each factor level

Here is my problem, I want to select rows with the minimum value in the specified column. For instance:

df <- data.frame(A=c("a","a","b","b"),value=1:4) 

As a result i want

  A value a 1 b 3 

I could do with by and ddply , but they are pretty slow when df is huge and has many different meanings in A

 do.call(rbind,by(df,df$A, function(x) x[which.min(abs(x$value)),],simplify=FALSE)) ddply(df, ~A, function(x){x[which.min(abs(x$value)),]}) 

Any suggestions?

Thanks a lot!

+4
source share
2 answers

data.table pretty fast for big data frames if you set the key.

 dt <- data.table(df, key="A") dt[, list(value=min(value)), by=A] 

Literature:

+2
source

tapply does the following:

 > tapply(df$value, df$A, min) ab 1 3 

Edited: Using by instead of tapply , we can save the string names:

 df <- data.frame(A=c("a","a","b","b"),value=11:14) df ## A value ## 1 a 11 ## 2 a 12 ## 3 b 13 ## 4 b 14 do.call(rbind, unname(by(df, df$A, function(x) x[x$value == min(x$value),]))) ## A value ## 1 a 11 ## 3 b 13 
0
source

Source: https://habr.com/ru/post/1447565/


All Articles