Any way to do filtering as well as summarize in ddply?

Question

Any way to do filtering as well as summarize in ddply?

I am just starting out with ddply and find this very useful. I want to generalize the data frame and also get rid of some rows in the final result, based on whether the summed column has a specific value. This is similar to HAVING as well as GROUP BY in SQL. Here is an example:

 input = data.frame(id= c( 1, 1, 2, 2, 3, 3), metric= c(30,50,70,90,40,1050), badness=c( 1, 5, 7, 3, 3, 99)) intermediateoutput = ddply(input, ~ id, summarize, meanMetric=mean(metric), maxBadness=max(badness)) intermediateoutput[intermediateoutput$maxBadness < 50,1:2]

This gives:

  id meanMetric 1 1 40 2 2 80

what i want, but can i do it in one step in the ddply somehow?

+6

r dplyr plyr

Tooone Jul 16 '14 at 13:31

source share

1 answer

juba · Accepted Answer · 2014-07-16T13:46:15+0000

You should try dplyr . This is faster and the code is much easier to read and understand, especially if you use channels ( %>% ):

 input %>% group_by(id) %>% summarize(meanMetric=mean(metric), maxBadness=max(badness)) %>% filter(maxBadness <50) %>% select(-maxBadness)

Following @Arun's comment, you can simplify the code as follows:

 input %>% group_by(id) %>% filter(max(badness)<50) %>% summarize(meanMetric=mean(metric))

Any way to do filtering as well as summarize in ddply?

More articles: