You can significantly improve the performance of this operation. The good news is that you donβt have to rush into Java for this: Clojure is very fast if you optimize it correctly and in most cases you can create the same speed as pure Java.
For maximum numerical code performance in Clojure, you want to use:
- arrays because you want to change the repository with very fast write and search. Clojure sequences and vectors are beautiful, but they come with overheads that you probably want to avoid for really critical code
- double because they offer much faster mathematical models.
- aset / aget / areduce are extremely fast operations designed for arrays and basically give you the same bytecode as pure Java equivalents.
- imperative style - although it is uniform in Clojure, it gets the fastest results (mainly because you can avoid the overhead of memory allocation, boxing and function calls). An example is dotimes for a fast imperative cycle.
- (set! * warn-on-reflection * true) - and eliminate any warnings that your code generates, because reflection is a big killer of performance.
The following should be along the correct lines and probably you will get roughly equivalent Java performance:
(def kernel (double-array [0 1 1 2 3 3 0 0 0 0 0 0])) (def data (double-array [1 5 7 4 8 3 9 5 6 3 2 1 1 7 4 9 3 2 1 8 6 4])) (defn convolve [^doubles kernel-array ^doubles data-array] (let [ks (count kernel-array) ds (count data-array) output (double-array (+ ks ds)) factor (/ 1.0 (areduce kernel-array i ret 0.0 (+ ret (aget kernel-array i))))] (dotimes [i (int ds)] (dotimes [j (int ks)] (let [offset (int (+ ij))] (aset output offset (+ (aget output offset) (* factor (* (aget data-array i) (aget kernel-array j)))))))) output)) (seq (convolve kernel data)) => (0.0 0.1 0.6 1.4 2.4 4.4 5.5 6.1000000000000005 5.600000000000001 6.200000000000001 5.499999999999999 5.9 4.199999999999999 3.3000000000000003 2.5 2.2 3.3 4.4 5.6000000000000005 4.8 4.8999999999999995 3.1 3.5 4.300000000000001 5.0 3.0 1.2000000000000002 0.0 0.0 0.0 0.0 0.0 0.0 0.0)
I did not trim the output array or make any restrictions, so you probably have to crack this solution a bit to get exactly the result you want, but hopefully you get the idea .....
Some very rough benchmarking:
(time (dotimes [i 1000] (seq (convolve kernel data)))) => "Elapsed time: 8.174109 msecs"
i.e. that about 30 ns per core / data combination - I expect that I will be able to largely limit access to cached memory.
source share