What are Builder, Combiner and Splitter in scala?

The EPFL parallel programming course mentions four abstractions for parallelism data: Iterator , Builder , Combiner and Splitter .

I am familiar with Iterator , but have never used the other three. I saw other features of Builder , Combiner and Splitter in the scala.collection package. However, I have an idea how to use them in real development, in particular, how to use them in collaboration with other collections, such as List , Array , ParArray , etc. Can someone please give me some tips and examples?

Thanks!

+5
source share
1 answer

Two features of Iterator and Builder not parallelism, but they provide the foundation for Combiner and Splitter .

  • You already know that Iterator can help you iterate over a sequential set by providing the hasNext and next methods. A Splitter is a special case of Iterator and helps to split a collection into several disjoint subsets. The idea is that after splitting, these subsets can be processed in parallel. You can get a Splitter from a parallel set by calling .splitter on it. Two important Splitter attribute methods are as follows:
    • remaining: Int : returns the number of items in the current collection, or at least approximating this number. This information is important because it is used to determine whether it is worth sharing a collection. If your collection contains only a small number of elements, then you want to process these elements sequentially, rather than breaking the collection into even smaller subsets.
    • split: Seq[Splitter[A]] : a method that actually splits the current collection. It returns disjoint subsets (represented as Splitter s) that can be recursively split again if it's worth it. If the subsets are small enough, they can finally be processed (for example, filtered or displayed).
  • Builder are used to create new (sequential) collections. A Combiner is a special case of a Builder and at the same time is an analogue of Splitter . While Splitter splits your collection before processing it in parallel, Combiner merges the results after that. You can get a Combiner from a parallel set (a subset) by calling .newCombiner on it. This is done using the following method:
    • combine(that: Combiner[A, B]): Combiner[A, B] : combines your current collection with another collection by "merging" both Combiner s. The result is a new Combiner that either represents the final result, or is again combined with another subset (by the way: type A and B parameters represent the element type and type or result set).

The point is that you do not need to implement or even use these methods directly if you do not define a new parallel collection. The idea is that people introducing new parallel collections only need to define separators and combinators and get a whole bunch of other operations for free, because these operations are already implemented and use separators and combinators.

Of course, this is just a superficial description of how these things work. For further reading, I recommend reading Library Architecture for Parallel Collections , as well as Creating Custom Parallel Collections .

+7
source

Source: https://habr.com/ru/post/1258373/


All Articles