Binary search in clojure (implementation / performance)

I wrote a binary search function as part of a larger program, but it seems to be slower than it should be, and profiling shows a lot of method calls in clojure.lang.Numbers.

I understand that Clojure can use primitives when it can determine what it can do. Calls to methods in clojure.lang.Numbers seem to indicate that primitives are not used here.

If I force loop variables to ints, it correctly complains that recur arguments are not primitive. If I force them too, the code works again, but again it slows down. My only suggestion: (quot (+ low-idx high-idx) 2) does not create a primitive, but I'm not sure where to go from here.

This is my first program in Clojure, so feel free to let me know if there are ways to make cleaner / functional / Clojure ways to do something.

 (defn binary-search [coll coll-size target] (let [cnt (dec coll-size)] (loop [low-idx 0 high-idx cnt] (if (> low-idx high-idx) nil (let [mid-idx (quot (+ low-idx high-idx) 2) mid-val (coll mid-idx)] (cond (= mid-val target) mid-idx (< mid-val target) (recur (inc mid-idx) high-idx) (> mid-val target) (recur low-idx (dec mid-idx)) )))))) (defn binary-search-perf-test [test-size] (do (let [test-set (vec (range 1 (inc test-size))) test-set-size (count test-set)] (time (count (map #(binary-search2 test-set test-set-size %) test-set))) ))) 
+6
source share
2 answers

First of all, you can use the binary search implementation provided by java.util.Collections :

 (java.util.Collections/binarySearch [0 1 2 3] 2 compare) ; => 2 

If you skip compare , the search will be even faster, unless the collection includes bigints, in which case it will break.

As for your pure Clojure implementation, you can hint coll-size to ^long in the parameter vector - or just ask the size of the vector at the beginning of the function body (which is very fast, constant time), replace the call (quot ... 2) with (bit-shift-right ... 1) and use unverified math to calculate the index. With some additional settings, a binary search can be written as follows:

 (defn binary-search "Finds earliest occurrence of x in xs (a vector) using binary search." ([xs x] (loop [l 0 h (unchecked-dec (count xs))] (if (<= h (inc l)) (cond (== x (xs l)) l (== x (xs h)) h :else nil) (let [m (unchecked-add l (bit-shift-right (unchecked-subtract hl) 1))] (if (< (xs m) x) (recur (unchecked-inc m) h) (recur lm))))))) 

This is still noticeably slower than the Java variant:

 (defn java-binsearch [xs x] (java.util.Collections/binarySearch xs x compare)) 

binary-search , as defined above, seems to take about 25% more time than this java-binsearch .

+7
source

in Clojure 1.2.x you can only force local variables, and they cannot cross function calls. Starting with Clojure 1.3.0, Clojure can use primitive numbers to call functions, but not through higher-order functions, such as map .

if you are using Clojure 1.3.0+ then you can accomplish this using tooltips like

as with any Clojure optimization problem, the first step is to enable (set! *warn-on-reflection* true) , then add type hints until it no longer complains.

 user=> (set! *warn-on-reflection* true) true user=> (defn binary-search [coll coll-size target] (let [cnt (dec coll-size)] (loop [low-idx 0 high-idx cnt] (if (> low-idx high-idx) nil (let [mid-idx (quot (+ low-idx high-idx) 2) mid-val (coll mid-idx)] (cond (= mid-val target) mid-idx (< mid-val target) (recur (inc mid-idx) high-idx) (> mid-val target) (recur low-idx (dec mid-idx)) )))))) NO_SOURCE_FILE:23 recur arg for primitive local: low_idx is not matching primitive, had: Object, needed: long Auto-boxing loop arg: low-idx #'user/binary-search user=> 

to remove this you can enter a column size argument hint

 (defn binary-search [coll ^long coll-size target] (let [cnt (dec coll-size)] (loop [low-idx 0 high-idx cnt] (if (> low-idx high-idx) nil (let [mid-idx (quot (+ low-idx high-idx) 2) mid-val (coll mid-idx)] (cond (= mid-val target) mid-idx (< mid-val target) (recur (inc mid-idx) high-idx) (> mid-val target) (recur low-idx (dec mid-idx)) )))))) 

for obvious reasons, it’s hard to connect the autobox on line 10 to the coll-size parameter, since it goes through cnt , then high-idx , then mid-ixd , etc., so I usually approach these problems with type-hinting everything until I find one that causes the warnings to go away, and then removes the prompts until they go away.

+1
source

Source: https://habr.com/ru/post/906518/


All Articles