Scala prime algorithm performance

Question

Scala prime algorithm performance

I am new to Scala, and therefore, to start writing code, I implemented this simple program:

package org.primes.sim object Primes { def is_prime(a: Int): Boolean = { val l = Stream.range(3, a, 2) filter { e => a % e == 0} l.size == 0 } def gen_primes(m: Int) = 2 #:: Stream.from(3, 2) filter { e => is_prime(e) } take m def primes(m : Int) = { gen_primes(m) foreach println } def main(args: Array[String]) { if (args.size == 0) primes(10) else primes(args(0).toInt) } }

It generates n primes starting with 2. Then I implemented the same algorithm in C ++ 11 using the range-v3 Eric Nibler library. This is the code:

 #include <iostream> #include <vector> #include <string> #include <range/v3/all.hpp> using namespace std; using namespace ranges; inline bool is_even(unsigned int n) { return n % 2 == 0; } inline bool is_prime(unsigned int n) { if (n == 2) return true; else if (n == 1 || is_even(n)) return false; else return ranges::any_of( view::iota(3, n) | view::remove_if(is_even), [n](unsigned int e) { return n % e == 0; } ) == false; } void primes(unsigned int n) { auto rng = view::ints(2) | view::filter(is_prime); ranges::for_each(view::take(rng, n), [](unsigned int e){ cout << e << '\n'; }); } int main(int argc, char* argv[]) { if (argc == 1) primes(100); else if (argc > 1) { primes(std::stoi(argv[1])); } }

As you can see, the code looks very similar, but the performance is very different:

For n = 5000, C ++ terminates at 0.265 s instead of Scala ends at 24,314 s !!! So from this test, Scala seems to be 100 times slower than C ++ 11.

What is the problem in Scala code? Could you give me some tips for making better use of scalac?

Note. I compiled the code in C ++ using gcc 4.9.2 and -O3 opt.

thanks

+6

performance scala c ++ 11

Gian lorenzo meocci May 03 '15 at 12:00

source share

1 answer

Kolmar · Accepted Answer · 2015-05-03T12:34:21+0000

The main speed issue is with the implementation of is_prime .

First of all, you filter the stream to find all the dividers, and then check to see if there are ( l.size == 0 ). But it's faster to return false as soon as the first divisor is found:

 def is_prime(a: Int): Boolean = Stream.range(3, a, 2).find(a % _ == 0).isEmpty

This reduced the runtime from 22 seconds to 5 seconds for primes(5000) on my machine.

The second problem is Stream itself. Scala Streams are slow, and using them for simple number calculations is a huge bust. Replacing Stream with Range reduced execution time to 1.2 seconds:

 def is_prime(a: Int): Boolean = 3.until(a, 2).find(a % _ == 0).isEmpty

This is decent: 5 times slower than C ++. I usually stop here, but we can reduce the execution time a little more if we remove the higher order function find .

While nice and functional, find also causes some overhead. The loop implementation (basically replacing find with foreach ) further reduced the execution time to 0.45 seconds, which is less than 2x slower than C ++ (which already has the JVM overhead order):

 def is_prime(a: Int): Boolean = { for (e <- 3.until(a, 2)) if (a % e == 0) return false true }

There is another thread in gen_primes , so something with it can improve the runtime more, but, in my opinion, this is not necessary. At this point in performance improvement, I think it would be better to switch to some other prime number generation algorithm: for example, using only prime numbers instead of all the odd numbers to search for divisors or using Sieve of Eratosthenes.

In general, functional abstractions in Scala are implemented using real objects on the heap that have some overhead, and the JIT compiler cannot fix everything. But the C ++ selling point is zero-cost abstractions: everything that is possible is expanded at compile time through template s, constexpr and is even more aggressively optimized by the compiler.

Scala prime algorithm performance

More articles: