Asymptotic behavior of Scala methods

Somewhere I can find the expected time and complexity gap of operations on collections such as HashSet, TreeSet, List, etc.

Can they be expected to know the properties of the abstract data themselves?

I know the Performance Characteristics for Scala collections , but this only mentions some very simple operations. Perhaps the rest of the operations for these collections are built entirely from a small base set, but then it seems like I just need to know that they implemented them that way?

+6
source share
2 answers

The performance characteristics of other methods are really hard to say. Consider the following:

  • All these methods are implemented on the basis of foreach or iterator and usually at very high levels of the hierarchy. Vector map is implemented, for example, on collection.TraversableLike . To add insult to injury, the method implementation used depends on the linearization of class inheritance. This also applies to any method called helper. This happened before the changes here caused unforeseen performance issues. Since foreach and iterator both O(n) , any improved performance depends on specialization with other methods such as size and slice .
  • For many of them, there is a dependence on the performance characteristics of the created constructor, which depends on the call site, and not on the definition site.

Thus, the result is that the place where the method is defined and documented does not contain enough information to determine its performance characteristics and may depend not only on how other methods are implemented by the inheriting collection, but even on the performance characteristics of the Builder object received from CanBuildFrom, which is passed to the call site.

In the best case, any such documentation will be described in terms of other methods. This does not mean that it is not worth it, but it is not easy to do - and the difficult tasks in open source projects depend on volunteers who usually work on what they like and not on what they need.

+4
source

There should be a guide for other methods - just think about how an effective implementation should look.

Most of the other operations with arrays in collections (operations that process each element in the collection) are O(n) , so they are not mentioned there. Examples are filter , map , foreach , indexOf , reverse , find ...

Methods that return iterators or streams, such as combinations and permutations , are usually O(1) .

Methods involving 2 collections, usually O(max(n, m)) or O(min(n, m)) . This is zip , zipAll , sameElements , corresponds , ...

The union , diff and intersect methods are O(n + m) .

Sorting options, naturally, O(nlogn) . groupBy O(nlogn) in the current implementation. indexOfSlice uses the KMP and O(m + n) algorithm, where m and n are the string lengths.

Methods such as +: , :+ or patch , usually O(n) , if you are not dealing with a specific case of an immutable collection for which this operation is more efficient - for example, adding an element to a functional List or adding an element to Vector .

toX methods toX usually O(n) , since they must toX over all the elements and create a new collection. The exception is toStream , which builds the collection lazily - therefore it is O(1) . Also, whenever X is a type of the toX collection, this returned, being O(1) .

Iterator implementations must have O(1) (amortized) next and hasNext . Creating an iterator should be the worst O(logn) , but O(1) in most cases.

+7
source

Source: https://habr.com/ru/post/899599/


All Articles