group-by
Generalization
I needed a group-by
generalization that created more than 2-nested card maps. I would like to provide such a function with a list of arbitrary functions for recursive start through group-by
. Here is what I came up with:
(defn map-function-on-map-vals "Take a map and apply a function on its values. From [1]. [1] http://stackoverflow.com/a/1677069/500207" [mf] (zipmap (keys m) (map f (vals m)))) (defn nested-group-by "Like group-by but instead of a single function, this is given a list or vec of functions to apply recursively via group-by. An optional `final` argument (defaults to identity) may be given to run on the vector result of the final group-by." [fs coll & [final-fn]] (if (empty? fs) ((or final-fn identity) coll) (map-function-on-map-vals (group-by (first fs) coll) #(nested-group-by (rest fs) % final-fn))))
Your example
Applies to your dataset:
cljs.user=> (def foo [ ["A" 2011 "Dan"] #_=> ["A" 2011 "Jon"] #_=> ["A" 2010 "Tim"] #_=> ["B" 2009 "Tom"] ]) cljs.user=> (require '[cljs.pprint :refer [pprint]]) nil cljs.user=> (pprint (nested-group-by [first second] foo)) {"A" {2011 [["A" 2011 "Dan"] ["A" 2011 "Jon"]], 2010 [["A" 2010 "Tim"]]}, "B" {2009 [["B" 2009 "Tom"]]}}
Produces the exact desired output. nested-group-by
can take three or four or more functions and creates many hash card slots. It may be useful to others.
Convenient function
nested-group-by
also has a convenient additional function: final-fn
, which defaults to identity
, so if you did not specify it, the deepest nesting returns a vector of values, but if you provide final-fn
, it runs on the innermost vectors . To illustrate: if you just wanted to know how many rows of the original dataset appeared in each category and year:
cljs.user=> (nested-group-by [first second] foo count) #^^^^^ this is final-fn {"A" {2011 2, 2010 1}, "B" {2009 1}}
Caveat
This function does not use recur
, so deep recursive calls can explode the stack. However, for the intended use case with few features, this should not be a problem.
source share