Scala range versus list performance in large collections

I performed a set of performance tests for 10,000,000 items, and I found that the results in different implementations are very different.

Can someone explain why creating Range.ByOne leads to performance that is better than a simple array of primitives, but converting the same range to a list leads to even worse performance than the worst case scenario?

Create 10,000,000 elements and print those that are 1,000,000 modules. The JVM size is always set to the minimum and maximum: -Xms? M-xmx? M

import java.util.concurrent.TimeUnit import java.util.concurrent.TimeUnit._ object LightAndFastRange extends App { def chrono[A](f: => A, timeUnit: TimeUnit = MILLISECONDS): (A,Long) = { val start = System.nanoTime() val result: A = f val end = System.nanoTime() (result, timeUnit.convert(end-start, NANOSECONDS)) } def millions(): List[Int] = (0 to 10000000).filter(_ % 1000000 == 0).toList val results = chrono(millions()) results._1.foreach(x => println ("x: " + x)) println("Time: " + results._2); } 

It takes 141 milliseconds with a JVM size of 27 m

By comparison, converting to a list dramatically affects performance:

 import java.util.concurrent.TimeUnit import java.util.concurrent.TimeUnit._ object LargeLinkedList extends App { def chrono[A](f: => A, timeUnit: TimeUnit = MILLISECONDS): (A,Long) = { val start = System.nanoTime() val result: A = f val end = System.nanoTime() (result, timeUnit.convert(end-start, NANOSECONDS)) } val results = chrono((0 to 10000000).toList.filter(_ % 1000000 == 0)) results._1.foreach(x => println ("x: " + x)) println("Time: " + results._2) } 

Requires 8514-10896 ms from 460-455 m

Unlike this Java implementation, an array of primitives is used.

 import static java.util.concurrent.TimeUnit.*; public class LargePrimitiveArray { public static void main(String[] args){ long start = System.nanoTime(); int[] elements = new int[10000000]; for(int i = 0; i < 10000000; i++){ elements[i] = i; } for(int i = 0; i < 10000000; i++){ if(elements[i] % 1000000 == 0) { System.out.println("x: " + elements[i]); } } long end = System.nanoTime(); System.out.println("Time: " + MILLISECONDS.convert(end-start, NANOSECONDS)); } } 

116 ms required with 59 m JVM size

Java Integer List

 import java.util.List; import java.util.ArrayList; import static java.util.concurrent.TimeUnit.*; public class LargeArrayList { public static void main(String[] args){ long start = System.nanoTime(); List<Integer> elements = new ArrayList<Integer>(); for(int i = 0; i < 10000000; i++){ elements.add(i); } for(Integer x: elements){ if(x % 1000000 == 0) { System.out.println("x: " + x); } } long end = System.nanoTime(); System.out.println("Time: " + MILLISECONDS.convert(end-start, NANOSECONDS)); } 

}

3993 ms required with 283 m JVM size

My question is: why is the first example so effective and the second so much affected. I tried to create views, but could not reproduce the performance benefits of the range.

All tests performed on Mac OS X Snow Leopard, 64-bit Java server 6u26 Scala 2.9.1.final

EDIT:

to conclude, here is the actual implementation using LinkedList (which is a fairer comparison in terms of space than ArrayList, since, as rightly pointed out, the scala List are linked)

 import java.util.List; import java.util.LinkedList; import static java.util.concurrent.TimeUnit.*; public class LargeLinkedList { public static void main(String[] args){ LargeLinkedList test = new LargeLinkedList(); long start = System.nanoTime(); List<Integer> elements = test.createElements(); test.findElementsToPrint(elements); long end = System.nanoTime(); System.out.println("Time: " + MILLISECONDS.convert(end-start, NANOSECONDS)); } private List<Integer> createElements(){ List<Integer> elements = new LinkedList<Integer>(); for(int i = 0; i < 10000000; i++){ elements.add(i); } return elements; } private void findElementsToPrint(List<Integer> elements){ for(Integer x: elements){ if(x % 1000000 == 0) { System.out.println("x: " + x); } } } } 

It takes 3621-6749 ms from 480-460 mb. This is much more consistent with the performance of the second scala example.

finally LargeArrayBuffer

 import collection.mutable.ArrayBuffer import java.util.concurrent.TimeUnit import java.util.concurrent.TimeUnit._ object LargeArrayBuffer extends App { def chrono[A](f: => A, timeUnit: TimeUnit = MILLISECONDS): (A,Long) = { val start = System.nanoTime() val result: A = f val end = System.nanoTime() (result, timeUnit.convert(end-start, NANOSECONDS)) } def millions(): List[Int] = { val size = 10000000 var items = new ArrayBuffer[Int](size) (0 to size).foreach (items += _) items.filter(_ % 1000000 == 0).toList } val results = chrono(millions()) results._1.foreach(x => println ("x: " + x)) println("Time: " + results._2); } 

Taking about 2145 ms and 375 mb

Thanks so much for the answers.

+4
source share
4 answers

Oh, so many things here!

Let's start with Java int[] . Arrays in Java are the only collection that is not erased. The runtime representation of int[] differs from the runtime representation of Object[] in that it actually uses int directly. Because of this, boxing is not involved in its use.

In memory, you have 40,000,000 consecutive bytes in memory that are read and written 4 times every time you read or write an item.

On the contrary, ArrayList<Integer> , as well as almost any other general collection, consists of 40,000,000 or 80,000.00 consecutive bytes (32 and 64 bits of the JVM, respectively), PLUS 80,000,000 bytes to distribute all the memory in groups of 8 bytes. Each reading of a write to an element must go through two memory spaces, and the explicit time taken to process all that memory is significant when the actual task that you are performing is so fast.

So, back to Scala, for the second example, where you control List . Now the Scala List much more like a Java LinkedList than a very erroneous ArrayList . Each List element consists of an object named Cons , which has 16 bytes, with a pointer to an element and a pointer to another list. Thus, a List of 10,000,000 items consists of 160,000,000 items distributed throughout the memory in groups of 16 bytes, plus 80,000,000 bytes distributed throughout the memory in groups of 8 bytes. So, what was true for ArrayList , especially for List

Finally, Range . A Range is a sequence of integers with a lower and upper bound, plus a step. A Range of 10,000,000 elements - 40 bytes: three ints (not shared) for the lower and upper bounds and step plus a few precomputed values ​​( last , numRangeElements ) and two other numRangeElements used for lazy val stream security. Just to make it clear that NOT 40 times 10.000.000: this is 40 bytes TOTAL. The size of the range does not matter at all, because THIS DOES NOT INSTALL INDIVIDUAL ELEMENTS. Only lower bound, upper bound and step.

Now, since a Range is Seq[Int] , for most purposes you still need to go through the box: int will be converted to Integer , and then again to int , which is sadly wasteful.

Counting size calculation

So, here is a preliminary calculation of the Cons. First of all, read this article about some general guidelines regarding how much memory an object takes. Important points:

  • Java uses 8 bytes for ordinary objects and 12 for arrays of objects, for information about "household" (which is the class of this object, etc.).
  • Objects are allocated in 8 bytes. If your object is smaller than this, it will be supplemented by an addition to it.

I really thought it was 16 bytes, not 8. Anyway, the Cons are also less than what I thought. His fields:

 public static final long serialVersionUID; // static, doesn't count private java.lang.Object scala$collection$immutable$$colon$colon$$hd; private scala.collection.immutable.List tl; 

Links at least 4 bytes (may be larger on a 64-bit JVM). So we have:

 8 bytes Java header 4 bytes hd 4 bytes tl 

This makes it only 16 bytes. Actually pretty good. In the example, hd will point to an Integer object, which I assume is 8 bytes long. As for tl , it points to other cons that we already consider.

I am going to revise estimates with actual data, where possible.

+12
source

In the first example, you create a linked list with 10 elements, calculating the steps of the range.

In the second example, you create a linked list with 10 million items and filter it to a new linked list with 10 items.

In the third example, you create a buffer that supports an array with 10 million elements that you go through and print, a new array is not created , a backup buffer .. p>

Output:

Each piece of code does something different, so performance varies greatly.

+4
source

This is a reasonable assumption ...

I think this is because in the fast version, the Scala compiler is able to translate the key operator into something similar (in Java):

 List<Integer> millions = new ArrayList<Integer>(); for (int i = 0; i <= 10000000; i++) { if (i % 1000000 == 0) { millions.add(i); } } 

As you can see, (0 to 10000000) does not create an intermediate list of 10,000,000 Integer objects.

In contrast, in the slow version, the Scala compiler cannot perform this optimization and generates this list.

(The intermediate data structure may be int[] , but the observed size of the JVM suggests that it is not.)

+1
source

It's hard to read the Scala source on my iPad, but it looks like the Range constructor doesn't actually create a list, just remembering the start, increment, and end. It uses them to get its values ​​on demand, so iterating over a range is much closer to a simple loop than checking array elements.

Once you pronounce range.toList , you force Scala to create a linked list of "values" in the range (memory allocation for both values ​​and references), and then you repeat this. Being a linked list, the performance of this will be worse than your Java ArrayList example.

+1
source

Source: https://habr.com/ru/post/1379874/


All Articles