The main speed issue is with the implementation of is_prime .
First of all, you filter the stream to find all the dividers, and then check to see if there are ( l.size == 0 ). But it's faster to return false as soon as the first divisor is found:
def is_prime(a: Int): Boolean = Stream.range(3, a, 2).find(a % _ == 0).isEmpty
This reduced the runtime from 22 seconds to 5 seconds for primes(5000) on my machine.
The second problem is Stream itself. Scala Streams are slow, and using them for simple number calculations is a huge bust. Replacing Stream with Range reduced execution time to 1.2 seconds:
def is_prime(a: Int): Boolean = 3.until(a, 2).find(a % _ == 0).isEmpty
This is decent: 5 times slower than C ++. I usually stop here, but we can reduce the execution time a little more if we remove the higher order function find .
While nice and functional, find also causes some overhead. The loop implementation (basically replacing find with foreach ) further reduced the execution time to 0.45 seconds, which is less than 2x slower than C ++ (which already has the JVM overhead order):
def is_prime(a: Int): Boolean = { for (e <- 3.until(a, 2)) if (a % e == 0) return false true }
There is another thread in gen_primes , so something with it can improve the runtime more, but, in my opinion, this is not necessary. At this point in performance improvement, I think it would be better to switch to some other prime number generation algorithm: for example, using only prime numbers instead of all the odd numbers to search for divisors or using Sieve of Eratosthenes.
In general, functional abstractions in Scala are implemented using real objects on the heap that have some overhead, and the JIT compiler cannot fix everything. But the C ++ selling point is zero-cost abstractions: everything that is possible is expanded at compile time through template s, constexpr and is even more aggressively optimized by the compiler.