Any way to do filtering as well as summarize in ddply?

I am just starting out with ddply and find this very useful. I want to generalize the data frame and also get rid of some rows in the final result, based on whether the summed column has a specific value. This is similar to HAVING as well as GROUP BY in SQL. Here is an example:

 input = data.frame(id= c( 1, 1, 2, 2, 3, 3), metric= c(30,50,70,90,40,1050), badness=c( 1, 5, 7, 3, 3, 99)) intermediateoutput = ddply(input, ~ id, summarize, meanMetric=mean(metric), maxBadness=max(badness)) intermediateoutput[intermediateoutput$maxBadness < 50,1:2] 

This gives:

  id meanMetric 1 1 40 2 2 80 

what i want, but can i do it in one step in the ddply somehow?

+6
source share
1 answer

You should try dplyr . This is faster and the code is much easier to read and understand, especially if you use channels ( %>% ):

 input %>% group_by(id) %>% summarize(meanMetric=mean(metric), maxBadness=max(badness)) %>% filter(maxBadness <50) %>% select(-maxBadness) 

Following @Arun's comment, you can simplify the code as follows:

 input %>% group_by(id) %>% filter(max(badness)<50) %>% summarize(meanMetric=mean(metric)) 
+11
source

Source: https://habr.com/ru/post/972322/


All Articles