Clojure: Divide the sequence into top-n and relax?

Question

Clojure: Divide the sequence into top-n and relax?

I would like to split a sequence of top-n elements into it, and the rest. Here's an inefficient implementation using built-in sorting and splitting into:

> (defn split-top-n [n comp coll] (split-at n (sort comp coll))) > (split-top-n 2 #(- %2 %1) (list 6.2 5.1 88.0 90.1 1.2 16.9)) [(90.1 88.0) (16.9 6.2 5.1 1.2)]

Is there an effective Clojure built-in for this? Or do I need to write my own?

+4

clojure

bjun Sep 11 '13 at 11:13

source share

3 answers

You can use a data structure such as a sorted set that is currently implemented as clojure.lang.PersistentTreeSet . This way you can avoid sorting before you get your top n items (I would say).

 (-> (sorted-set-by >) (conj 90) (conj 10) (conj 1)) #{90 10 1}

Now you can call the split-at function:

 (split-at n previous-sorted-set)

But it depends on whether you want / can use a sorted set.

+2

Chiron Sep 11 '13 at 12:06

source share

Looks like possible use of finger trees

 (require '[clojure.data.finger-tree :as ft]) (def css (apply ft/counted-sorted-set (list 6.2 5.1 88.0 90.1 1.2 16.9))) (ft/ft-split-at css 3) [(1.2 5.1 6.2) 16.9 (88.0 90.1)]

0

Hendekagon 12 sept '13 at 4:17

source share

Rörd · Accepted Answer · 2013-09-11T12:50:20+0000

There is no such function in the standard library. The simple implementation that you have already written is actually inefficient for the special case of small values of n , but in the general case it works great.

While you do not know that this function in its current implementation is indeed a significant performance bottleneck in your full application, it is probably a waste of time to write a more complex version.

EDIT : after thinking about this, you can try to write an implementation that forces the sequence to a vector, and then makes Quickselect in place of n better than the elements at the beginning of the vector. This should be relatively easy to do and can provide reasonably better performance, provided that your support elements are well selected.

EDIT 2 . I decided to try this implementation myself. He did a great job with some simple test cases, but I'm not quite sure that there are no errors that may occur in some cases:

 (defn split-top-n [n comp coll] (let [v (transient (vec coll))] (loop [start 0, end (count v)] (when (> end n) (let [pos (loop [i (inc start), pos start] (if (< i end) (if (comp (vi) (v start)) (let [pos* (inc pos)] (assoc! v, i (v pos*), pos* (vi)) (recur (inc i) pos*)) (recur (inc i) pos)) (do (assoc! v, start (v pos), pos (v start)) pos)))] (if (< pos n) (recur (inc pos) end) (recur start pos))))) (split-at n (persistent! v))))

Clarification: this assumes a simple boolean comparison function for comp instead of one of a negative / zero / positive number.

EDIT 3 . I looked at the transient documentation again and noticed that I was using undefined behavior. In fact, it may be that the above version will always work as expected, but the correct version should nevertheless respect the documentation in the language. I will leave the previous version in this answer in place, because the answer has already been accepted with it, but here is the version using the return value of assoc! as the documentation requires:

 (defn swap-in! [vij] (assoc! v, i (vj), j (vi))) (defn quickpartition! [comp v start end] (loop [vv, i (inc start), pos start] (if (< i end) (if (comp (vi) (v start)) (recur (swap-in! vi (inc pos)) (inc i) (inc pos)) (recur v (inc i) pos)) [(swap-in! v start pos) pos]))) (defn split-top-n [n comp coll] (loop [v (transient (vec coll)), start 0, end (count v)] (if (> end n) (let [[v* pos] (quickpartition! comp v start end)] (if (< pos n) (recur v* (inc pos) end) (recur v* start pos))) (split-at n (persistent! v)))))

EDIT 4 : The poor readability of the previous version still annoys me, so now I have divided my implementation into several functions.

Clojure: Divide the sequence into top-n and relax?

More articles: