Clojure: Divide the sequence into top-n and relax?

I would like to split a sequence of top-n elements into it, and the rest. Here's an inefficient implementation using built-in sorting and splitting into:

> (defn split-top-n [n comp coll] (split-at n (sort comp coll))) > (split-top-n 2 #(- %2 %1) (list 6.2 5.1 88.0 90.1 1.2 16.9)) [(90.1 88.0) (16.9 6.2 5.1 1.2)] 

Is there an effective Clojure built-in for this? Or do I need to write my own?

+4
source share
3 answers

There is no such function in the standard library. The simple implementation that you have already written is actually inefficient for the special case of small values ​​of n , but in the general case it works great.

While you do not know that this function in its current implementation is indeed a significant performance bottleneck in your full application, it is probably a waste of time to write a more complex version.

EDIT : after thinking about this, you can try to write an implementation that forces the sequence to a vector, and then makes Quickselect in place of n better than the elements at the beginning of the vector. This should be relatively easy to do and can provide reasonably better performance, provided that your support elements are well selected.

EDIT 2 . I decided to try this implementation myself. He did a great job with some simple test cases, but I'm not quite sure that there are no errors that may occur in some cases:

 (defn split-top-n [n comp coll] (let [v (transient (vec coll))] (loop [start 0, end (count v)] (when (> end n) (let [pos (loop [i (inc start), pos start] (if (< i end) (if (comp (vi) (v start)) (let [pos* (inc pos)] (assoc! v, i (v pos*), pos* (vi)) (recur (inc i) pos*)) (recur (inc i) pos)) (do (assoc! v, start (v pos), pos (v start)) pos)))] (if (< pos n) (recur (inc pos) end) (recur start pos))))) (split-at n (persistent! v)))) 

Clarification: this assumes a simple boolean comparison function for comp instead of one of a negative / zero / positive number.

EDIT 3 . I looked at the transient documentation again and noticed that I was using undefined behavior. In fact, it may be that the above version will always work as expected, but the correct version should nevertheless respect the documentation in the language. I will leave the previous version in this answer in place, because the answer has already been accepted with it, but here is the version using the return value of assoc! as the documentation requires:

 (defn swap-in! [vij] (assoc! v, i (vj), j (vi))) (defn quickpartition! [comp v start end] (loop [vv, i (inc start), pos start] (if (< i end) (if (comp (vi) (v start)) (recur (swap-in! vi (inc pos)) (inc i) (inc pos)) (recur v (inc i) pos)) [(swap-in! v start pos) pos]))) (defn split-top-n [n comp coll] (loop [v (transient (vec coll)), start 0, end (count v)] (if (> end n) (let [[v* pos] (quickpartition! comp v start end)] (if (< pos n) (recur v* (inc pos) end) (recur v* start pos))) (split-at n (persistent! v))))) 

EDIT 4 : The poor readability of the previous version still annoys me, so now I have divided my implementation into several functions.

+4
source

You can use a data structure such as a sorted set that is currently implemented as clojure.lang.PersistentTreeSet . This way you can avoid sorting before you get your top n items (I would say).

 (-> (sorted-set-by >) (conj 90) (conj 10) (conj 1)) #{90 10 1} 

Now you can call the split-at function:

 (split-at n previous-sorted-set) 

But it depends on whether you want / can use a sorted set.

+2
source

Looks like possible use of finger trees

 (require '[clojure.data.finger-tree :as ft]) (def css (apply ft/counted-sorted-set (list 6.2 5.1 88.0 90.1 1.2 16.9))) (ft/ft-split-at css 3) [(1.2 5.1 6.2) 16.9 (88.0 90.1)] 
0
source

Source: https://habr.com/ru/post/1501637/


All Articles