Why is the minimum granularity defined as 8192 in Java8 for moving from parallel sorting to a .sort array regardless of data type

I was looking at parallel sorting concepts introduced in Java 8. According to the doc .

If the length of the specified array is less than the minimum granularity, then it is sorted using the appropriate arrays. Variety Method.

However, the specification does not define this minimum limit.
When I looked at the code in java.util.Arrays , it was defined as

 private static final int MIN_ARRAY_SORT_GRAN = 1 << 13; 

ie, 8192 in an array

In accordance with the explanation provided here . I understand why the value is hardcoded as 8192 .

It was designed with current processor architecture in mind.
If the -XX:+UseCompressedOops enabled by default, any system with less than 32 GB of memory will use 32-bit pointers (4 bytes). Now that the L1 Cache size is 32 KB for a piece of data, we can go through 32KB / 4Bytes = 8KB of data right away for the CPU to calculate. This is equivalent to 8192 bytes of data processed immediately.

So, for a function that works to sort the byte array parallelSort(byte[]) , this makes sense. You can keep the minimum parallel sorting limit as 8192 values ​​(each value = 1 byte for an array of bytes).

But if you consider public static void parallelSort(int[] a)

An integer variable has 4 bytes (32 bits). Therefore, ideally, out of 8192 bytes, we can store 8192/4 = 2048 numbers in the CPU cache simultaneously. Thus, the minimum level of detail in this case is 2048.

Why are all the parallelSort functions in Java (bytes [], int [], long [], etc.) using 8192 as the default value of min. the number of values ​​needed for parallel sorting?
Doesn't that depend on the types passed to the parallelSort function?

+5
source share
1 answer

First, it seems like you misunderstood the related explanation. The L1 data cache is 32 Kbps; therefore, it is ideal for int[] : 32768/4 32768/4=8192 ints can be placed in the L1 cache as well.

Secondly, I do not think this explanation is true. It focuses on pointers, so it mainly talks about sorting an array of objects, but when you are comparing data in an array of objects, you always need to play out these pointers by accessing the real data. And if your objects have non-primitive fields, you will have to play them even more. For example, if you are sorting an array of strings, you need to access not only the array itself, but also the String objects and char[] arrays that are stored inside them. All this will require many extra cache lines.

I did not find an explicit explanation for this value in the thread overview for this change. It used to be 256, then it was changed to 8192 as part of the JDK-8014076 update . I think it showed the best performance on some reasonable set of tests. Keeping separate thresholds for different cases will be even more difficult. Probably tests show that this is not paying off. Note that an ideal threshold is not possible for Object[] arrays, since the comparison function is user-defined and can be of arbitrary complexity. For a rather complicated comparison function, it is probably wise to parallelize even very small arrays.

+5
source

Source: https://habr.com/ru/post/1245476/


All Articles