Why Clojure core.reducers are faster than lazy collection features

In many Reducers resources (such as the canonical Rich Hickey blog post ), he argued that reducers are faster than regular gathering functions ( (map ... (filter ...)) , etc., since the overhead is less .

What are the additional costs that have been avoided? IIUC even lazy collection functions end up going through the original sequence only once. The difference in the details of calculating the intermediate results?

Pointers to relevant places in the Clojure implementation that will help you understand the difference will be very helpful.

+5
source share
2 answers

I think one key insight is in the following excerpt from the original blog post :

 (require '[clojure.core.reducers :as r]) (reduce + (r/filter even? (r/map inc [1 1 1 2]))) ;=> 6 

This should look familiar - these are the same named functions that apply in the same order, with the same arguments that give the same result as Clojure seq-based fns. The difference is that by decreasing desire, and these fns gearboxes exit the seq game, there is no overhead at every step, so they are faster. Laziness is great when you need it, but when you do not, you will not have to pay for it.

The implementation of the lazy sequence comes with a (linear) distribution cost: every time another element from the lazy seq is implemented, the rest of the seq is stored in a new thunk, and the representation of such a "thunk" is a new clojure.lang.LazySeq object .

I believe that those LazySeq objects are the distribution overhead mentioned in the quote. With gearboxes, the lazy seq elements are not gradually realized and, therefore, no LazySeq thunks are created at all.

+3
source

"Please note that no intermediate collections are produced."

This is an improvement because, for example, there is less pressure on the garbage collector.

When combining operations such as map , filter , these functions iterate over the collection and return a new collection, which is then passed to the next function. This does not apply to gearboxes.

Thus, they can be used because these functions (that is, they are composed the same way) and can be applied to the same collections. But with added performance.

Below is a rough and completely unscientific sketch of the display / display / filter operation with and without gears:

Without gears

Initial collection => map => intermediate collection => map => intermediate collection => filter => final collection

Without reducers (i.e. clojure.core map / filter functions) intermediate collections are produced lazily. That is, new elements are created only when it is required at the next stage of processing, either one element or one fragment at a time.

Note. A block is a block of 32 elements (although this should be considered as an implementation detail). This is for efficiency reasons, to avoid "jumping" from one calculation to another.

See the map function for a return value , for example: it returns lazy-seq (except for the case of converters ).

With gearboxes

Initial Collection => Card / Card / Filter Reducer => Final Collection

If you need to quickly process the entire collection, then laziness interferes with performance. Removing the intermediate layers of the lazy sections provides a performance boost. Gearboxes do this and process the collection with impatience, hence faster. You can also use the same functions to facilitate switching.

Please also read the comments below, especially from @ MichaΕ‚ Marczyk

+5
source

Source: https://habr.com/ru/post/1245086/


All Articles