I am just starting out with ddply and find this very useful. I want to generalize the data frame and also get rid of some rows in the final result, based on whether the summed column has a specific value. This is similar to HAVING as well as GROUP BY in SQL. Here is an example:
input = data.frame(id= c( 1, 1, 2, 2, 3, 3), metric= c(30,50,70,90,40,1050), badness=c( 1, 5, 7, 3, 3, 99)) intermediateoutput = ddply(input, ~ id, summarize, meanMetric=mean(metric), maxBadness=max(badness)) intermediateoutput[intermediateoutput$maxBadness < 50,1:2]
This gives:
id meanMetric 1 1 40 2 2 80
what i want, but can i do it in one step in the ddply somehow?
source share