Arrays.sort Java performance for primitive types and objects

I read several threads here about Arrays.sort, using "configured quick-sort" for primitive types and merge-sort for objects. I did a little test to prove it, but I found that the opposite is true.

int a[] = new int[50000]; //Integer a[] = new Integer[50000]; for(int i=0; i<50000; i++) { //a[i] = new Integer(new Random().nextInt(5000)); a[i] = new Random().nextInt(5000); } System.out.println(System.currentTimeMillis()); Arrays.sort(a); System.out.println(System.currentTimeMillis()); 

For an array of primitive type, it took 22 ms, where for an array with objects it took 98 ms. My i7 laptop with 8 cores and 8 GB of RAM. Did I run it incorrectly?

Many thanks!

+4
source share
3 answers

This is not surprising to me at all.

Firstly, you have primitives against the indirectness of the need to pursue links, comparisons between two primitives will be faster, etc.

Secondly, a primitive array will play very well with the processor cache. A non-primitive array is not necessary because there is no guarantee that the referenced objects are adjacent in memory (unlikely), and, in addition, the abstract objects are larger, which means that fewer of them can go into the cache at any time.

See in both cases, the values ​​in the arrays will correspond to the cache, but the problem with Integer[] is that you still have to leave the cache and click on the memory bus to chase links and find them in main memory; these links can point all over the heap. This will cause a bad processor to just wait and wait, as cache misses are now much more likely.

That is, you have such an array of primitives as this

  _ _ _ _ _ |5| |7| |2| |1| ... |4| 

and they all sit next to each other in memory. When a single value is pulled into the cache from memory, neighbors are also pulled into the cache. Quicksort and mergesort work on adjacent sections of the array, so they greatly benefit from the processor cache, which is good here (this is link locality )

But when you have an Integer array like this

  _ _ |--->|7| ______> |1| _ | _ | _ | | |_| | | ... |_| | | _ | _ |_____ |________>|4| |___>|5| | _ |__>|2| 

Link storage locations are continuous in memory, so they play well with the cache. The problem is * indirectness, the ability to relay Integer objects fragmented in memory, and the fact that fewer of them will go into the cache. This is an additional indirect relation, fragmentation and size problem - this is something that will not play well with the cache.

Again, for something like quicksort or mergesort that plays on adjacent segments of an array, this is huge, huge, huge and almost certainly explains most of the performance difference.

Am I starting it incorrectly?

Yes, please use System.nanoTime the next time you need to run a test. System.currentTimeMillis has a terrible resolution and is not suitable for benchmarking.

+12
source

Your int [] is suitable for your L2 cache. This is about 4 B * 50K, which is 200 KB, and your L2 cache is 256 KB. This will work much faster than your Object [], which will be in your L3 cache, as it has a size of 28 B * 50K or 1400 KB.

The L2 cache (~ 11 cycles) is about 4-6 times faster than your L3 cache (~ 45 - 75 cycles)

I bet if you run this more than once, you will get the best result as the code warms up.

 public static void test_int_array() { int a[] = new int[50000]; //Integer a[] = new Integer[50000]; Random random = new Random(); for (int i = 0; i < 50000; i++) { //a[i] = new Integer(new Random().nextInt(5000)); a[i] = random.nextInt(5000); } long start = System.nanoTime(); Arrays.sort(a); long time = System.nanoTime() - start; System.out.printf("int[] sort took %.1f ms%n", time / 1e6); } public static void test_Integer_array() { Integer a[] = new Integer[50000]; Random random = new Random(); for (int i = 0; i < 50000; i++) { a[i] = random.nextInt(5000); } long start = System.nanoTime(); Arrays.sort(a); long time = System.nanoTime() - start; System.out.printf("Integer[] sort took %.1f ms%n", time / 1e6); } public static void main(String... ignored) { for (int i = 0; i < 10; i++) { if (test_int_array()[0] > 0) throw new AssertionError(); if (test_Integer_array()[0] > 0) throw new AssertionError(); } } 

prints

 int[] sort took 32.1 ms Integer[] sort took 104.1 ms int[] sort took 4.0 ms Integer[] sort took 83.8 ms int[] sort took 33.4 ms Integer[] sort took 76.7 ms int[] sort took 4.4 ms Integer[] sort took 40.5 ms int[] sort took 3.8 ms Integer[] sort took 17.4 ms int[] sort took 4.7 ms Integer[] sort took 22.4 ms int[] sort took 4.4 ms Integer[] sort took 12.1 ms int[] sort took 3.7 ms Integer[] sort took 11.2 ms int[] sort took 3.9 ms Integer[] sort took 10.7 ms int[] sort took 3.6 ms Integer[] sort took 11.9 ms 

You can see how much difference the code can warm up.

+9
source

Am I starting it incorrectly?

Your benchmarking is pretty primitive and it doesn’t install anything. How does sorting time increase with array size for each case? What is the difference between primitive sorting and sorting of objects can be attributed to different costs of comparing primitives and comparing objects? (This does not depend on the performance of the sorting algorithm, but will be referred to the sorting algorithm by your test.)

As others have noted, if you choose things that take on the order of tens of milliseconds, you should use System.nanoTime ; System.currentTimeMillis often has a resolution of no more than 10 ms. However, simply using synchronization technology will not solve the more serious problems with your tests.

0
source

Source: https://habr.com/ru/post/1496004/


All Articles