Java 8 thread collector memory allocation speed compared to pre-allocated loop

I am wondering how Java 8 threads handle memory allocation if the terminal operation is a list collector.

Consider, for example,

List<Integer> result = myList.stream().map(doWhatever).collect(Collectors.toList()); 

vs

 List<Integer> result = new ArrayList<>(myList.size()); for(String s : myList) { result.add(doWhatever.apply(s)); } 

In the case of using the stream, it is not known how large the list will grow, which means that some sort of redistribution must occur. Is this assumption true?

Is the type of the resulting list some sort of linked list and therefore provides slower access to items than an ArrayList?

Should I not use streams with list collectors if I know the size of the resulting list from the very beginning?

+6
source share
6 answers

Behind the scene, Collectors.toList() will allow you to collect the resulting elements of your Stream into an ArrayList created using the default constructor, so with a default capacity of 10 , you really need to redistribute if the size exceeds 10 .

If you want to use a different implementation of List , use toCollection(Supplier<C> collectionFactory) , which is a more general collector that allows you to provide the factory with your Collection target.

For example, if you want to collect elements in a LinkedList , you can instead rewrite your code as follows:

 List<Integer> result = myList.stream() .map(doWhatever) .collect(Collectors.toCollection(LinkedList::new)); 

Assuming you want an ArrayList with a default value of 100 , the collector will be Collectors.toCollection(() -> new ArrayList<>(100)) .

+6
source

Collectors.toList() says nothing about its implementation. If you're interested, use toCollection(ArrayList::new) .

Should I not use streams with list collectors if I know the size of the resulting list from the very beginning?

No, go and use them. Distribution is cheap and the cost is minimal compared to a laconic victory. Preliminary lists are usually premature optimizations.

+6
source

If you look at the source code for Collectors.toList() , it will not be pre-allocated.

  public static <T> Collector<T, ?, List<T>> toList() { return new CollectorImpl<>((Supplier<List<T>>) ArrayList::new, List::add, (left, right) -> { left.addAll(right); return left; }, CH_ID); } 

It simply creates a new ArrayList with a default size, which then changes to subsequent calls to add / addAll .

+3
source

In the case of using the stream, it is not known how large the list will grow, which means that some sort of redistribution must occur. Is this assumption true?

He knows the previous pipeline, its size and creates an ArrayList<> with the default setting, despite this. It doesn't matter when you work with a dynamically optimized array.

Is the type of the resulting list some sort of linked list and therefore provides slower access to items than an ArrayList?

An ArrayList used by default, but you can provide your own provider and battery to change this behavior:

 stream.collect(() -> new ArrayList<>(SIZE), ArrayList::add, ArrayList::addAll); 

Should I not use streams with list collectors if I know the size of the resulting list from the very beginning?

Do not think about it. Along with the concise syntax, the Stream API provides many powerful things (such as parallelization) that you can use.

+2
source

Currently, the toList() collector is implemented using and returning an ArrayList (note that the container used during collection does not always have to match the type of final results). The path, the collector interface is defined, the collector has no chance of pre-selecting the list.

But in principle, since the standard Stream implementation and the predefined collector implementation of toList() are part of the same library, there may be a non-standard connection in future implementations (or alternative JREs), where the stream detects the toList() in the collect method and performs an optimized operation . But when the toList() collector is used, for example, as the downstream collector of the groupingBy collector, in any case there is no predicted size.

If you assume that a stream can predict its size, for example, in your example myList.stream().map(doWhatever) , the most efficient solution, given the current implementation, is

 List<ElementType> result=Arrays.asList(stream.toArray(ElementType[]::new)); 

since this operation will use a known size, even in parallel, or, especially when used with parallel flow, when splitting sub-sizes are predictable, since then no merging step is required, that is, all employees will write directly to the results array.

Unfortunately, if ElementType not a reproducible type, you must resort to an unverified operation here.

If sizes are not predictable, this solution may be even more efficient compared to the current toList() collector, but may be lost compared to a future implementation that may use non-linear storage.


Thus, the optimized option is only suitable for a specific setting. For most scenarios, the toList() collector is sufficient or maybe even better than any alternative in possible future implementations.

+2
source

For large concurrent threads, I found that toList () really had serious performance problems because the drive lists were merged multiple times, which led to something closer to O (N ^ 2) than O (N).

Here's an alternative to toList () Collector, which stores data in ConcurrentLinkedQueue until the finish stage - for a stream of 400,000 elements, the processing time of the collection is from 1500 ms to 30:

http://pastebin.com/Bi93uig6

+1
source

Source: https://habr.com/ru/post/1012764/


All Articles