Right now I have the following data.frame file that was created by original.df %.% group_by(Category) %.% tally() %.% arrange(desc(n)) .
DF <- structure(list(Category = c("E", "K", "M", "L", "I", "A", "S", "G", "N", "Q"), n = c(163051, 127133, 106680, 64868, 49701, 47387, 47096, 45601, 40056, 36882)), .Names = c("Category", "n"), row.names = c(NA, 10L), class = c("tbl_df", "tbl", "data.frame" )) Category n 1 E 163051 2 K 127133 3 M 106680 4 L 64868 5 I 49701 6 A 47387 7 S 47096 8 G 45601 9 N 40056 10 Q 36882
I want to create a "Other" field from the lower rank categories by n. i.e.
Category n 1 E 163051 2 K 127133 3 M 106680 4 L 64868 5 I 49701 6 Other 217022
I'm doing now
rbind(filter(DF, rank(rev(n)) <= 5), summarise(filter(DF, rank(rev(n)) > 5), Category = "Other", n = sum(n)))
which collapses all categories that are not in the top five in the Other category.
But I'm curious if there is a better way in dplyr or some other existing package. By "better" I mean shorter / more readable. I'm also interested in methods with clever or more flexible ways of choosing Other .