Segmentation errors and heap errors when using Remove to generate prime numbers

Question

Segmentation errors and heap errors when using Remove to generate prime numbers

I am trying to write a sieve for generating prime numbers for a Project Euler task.

The code is as follows:

(defn sieve [n] (reduce (fn [memo f] (remove #(and (= 0 (rem % f)) (not= % f)) memo)) (range 2 (+ 1 n)) (range 2 (+ 1 n))))

Up to 500,000 it works very quickly, less than 1 second, from 600,000 and before it begins segregation and failure with memory errors.

I suppose this has something to do with deletion and laziness, I searched a bit, tried (doall (delete ...)) instead of (delete), but it gets incredibly slow ...

I don't understand a bit, thanks for any help ...

+4

clojure

stehem Mar 20 '12 at 14:13

source share

3 answers

raek · Answer 1 · 2012-03-20T17:39:00+0000

Segfault? That sounds scary! I assume you mean a mistake.

When I run the function, I start getting errors of about 1000. So why does the stack overflow? This is due to laziness. The fix is to wrap the call for deletion in doall.

Reduce will go through each element in the sequence specified as the third argument, and save the state along this path. This state is initialized as the rage of integers from 2 to n + 1. The state is updated at each iteration using deletion. However, since remove is lazy, it will actually do nothing. Delete returns an object that can generate a sequence on demand, depending on the sequence it gave. I will try to explain this example:

 (reduce (fn [xs x] (remove #{x} xs)) coll (range 4))

The above expression will return a sequence of coll elements, but with numbers from 0 to 3 it is filtered out. To explain what happens at runtime, I will come up with a new notation: I will write the runtime (remove #{x} xs) as «IOU a seq like xs but with x removed» . Then the meaning of the abbreviation expression is:

 «IOU a seq like «IOU a seq like «IOU a seq like «IOU a seq like 'coll' but with 0 removed» but with 1 removed» but with 2 removed» but with 3 removed»

Each call to delete adds an "IOU" wrapper. This causes problems when you finally try to access the first value of the received seq (the outermost value is "IOU"). When the lazy-seq object passes, it is first checked to see if its value has been calculated. If the value has already been completed, the value is simply returned. If it is not, then it is calculated, stored, and returned.

The problem arises when one lazy-seq value ("IOU" value) forces another lazy-seq to do its job, because one stack frame is needed to implement lazy-seq. (In order to remember where to return when the second lazy seq is executed, a stack frame is needed.) If you have 1000 nested "IOUs", values, you need 1000 stack frames to implement them (provided that they were all unrealized at the beginning) .

The solution is to use the doall function from the Clojure standard library. It takes seq and makes it fully implement. If you end the call for deletion in doall, the reduction state will always contain a fully implemented seq between each iteration and without the "IOU" cascade, the values will increase. Instead of saving all the calculations for later use, you do them gradually.

Adrian mouat · Answer 2 · 2012-03-20T16:49:38+0000

I am not surprised - you are working with a huge list.

I think you might need a fundamentally different algorithm. For instance; to check if the number n is prime, you need to check if it can be separated by any simple <= square root of n. Using this knowledge, we can begin to build a list of primes, checking the numbers in sequential order, adding each new prime to the list.

This is a slow algorithm, but it can be accelerated using a “wheel” that skips obvious non-prime numbers (for example, numbers that are divisible by 2 or 3).

It's all in my head, so I apologize for any inaccuracies.

Arthur ulfeldt · Answer 3 · 2012-03-20T17:31:53+0000

In principle, a simple calculation is better suited for associative structures such as collections than iterative structures such as lists. Several other StackOverflowers contributed useful clojure answers:

The answer to this question ,
this question: why is this simple sieve implementation slower ,
this question is about fast primary generation in lisp

in general, I like to break this problem down into separate steps, so here is my answer using sets:

 (defn next-not-in-sieve [sieve start] (first (drop-while #(sieve %) (range 1 (inc start))))) (defn add-multiples [sieve max n] (into sieve (range n (inc max) n))) (defn primes [max sieve answer] (let [next (next-not-in-sieve sieve max)] (if (nil? next) answer (recur max (add-multiples sieve max next) (conj answer next)))))

it works fast enough for basic use, here you need to study Clojure, and not quickly find primes :)

 user> (time (def foo (sieve 60000))) "Elapsed time: 63167.82765 msecs" user> (time (def foo (primes 60000 #{1} []))) "Elapsed time: 33272.157235 msecs"

And what would be the learning point of clojure if we hadn't turned it into a lazy seq of primes :

 (defn primes [max sieve] (if-let [next (next-not-in-sieve sieve max)] (lazy-seq (cons next (primes max (add-multiples sieve max next))))))

and check the time:

 (time (def foo (doall (primes 60000 #{1})))) "Elapsed time: 33714.880871 msecs"

and, of course, for thousands who want some back ground to check out the wikipedia page on the main grids

Segmentation errors and heap errors when using Remove to generate prime numbers

More articles: