(Answer the loan: Renzo Borgatti ( @reborg ).)
First, let me set up some sample data that we will use for performance tests later. This vector contains 500k cards with the same key. Values ββoverlap 1/5 times.
(def data (mapv hash-map (repeat :samplevalue) (concat (range 1e5) (range 1e5) (range 1e5) (range 1e5) (range 1e5))))
Now do your transformation with the help of converters. Please note that this solution is not parallel. I shortened your .intValue to int , which does the same thing. In addition, the conditional sample :samplevalue from each map can be reduced to just (keep :samplevalue sequence) , which is equivalent to (remove nil? (map :samplevalue sequence)) . We will use Criterium for comparison.
(require '[criterium.core :refer [quick-bench]]) (quick-bench (transduce (comp (keep :samplevalue) (map int)) (completing #(assoc! %1 %2 (inc (get %1 %2 0))) persistent!) (transient {}) data)) ;; My execution time mean: 405 ms
Note that we are not calling frequencies as an external step this time. Instead, we put it in an operation. And just like frequencies does, we performed transitional hashmap operations for added performance. We do this using a transient hashmap as a seed and completing final value, causing persistent! On him.
We can draw this parallel. For maximum performance, we use the mutable Java ConcurrentHashMap instead of the immutable Clojure data structure.
(require '[clojure.core.reducers :as r]) (import '[java.util HashMap Collections Map] 'java.util.concurrent.atomic.AtomicInteger 'java.util.concurrent.ConcurrentHashMap) (quick-bench (let [concurrency-level (.availableProcessors (Runtime/getRuntime)) m (ConcurrentHashMap. (quot (count data) 2) 0.75 concurrency-level) combinef (fn ([] m) ([_ _])) ; just return `m` from the combine step rf (fn [^Map mk] (let [^AtomicInteger v (or (.get mk) (.putIfAbsent mk (AtomicInteger. 1)))] (when v (.incrementAndGet v)) m)) reducef ((comp (keep :samplevalue) (map int)) rf)] (r/fold combinef reducef data) (into {} m))) ;; My execution time mean: 70 ms
Here we use the fold from the clojure.core.reducers library to achieve parallelism. Please note that in a parallel context, any converters used must be stateless. Also note that ConcurrentHashMap does not support using nil as a key or value; fortunately, we donβt have to do this here.
The output is converted to an immutable hashmap Clojure at the end. You can delete this step and just use the ConcurrentHashMap instance for additional acceleration on my machine, deleting the into step will cause the whole fold take about 26 ms.
Edit 2017-11-20: User @clojuremostly correctly pointed out that an earlier version of this answer had a quick-bench call inside the let block that initialized the parallel hash map instance, which meant that the benchmark used the same instance for all his runs. I moved the call to quick-bench to be outside the let block. This did not significantly affect the results.