Why is my Eratosthenes sieve so slow?

Im solved some problems in Project Euler and had to generate 2 million primes to solve the problem. My implementation of Sieve Eratosthenes turned out to be EXTREMELY slow, but I do not quite understand why. Can someone explain the main problems with this implementation. I thought it was so beautiful, and then I realized that it was terrible :( I found another implementation of this online, and it was much faster than mine.

def generatePrimes(upperBound): numbers = range(2,upperBound+1) primes = [] while numbers: prime = numbers[0] primes.append(prime) numbers = filter((lambda x: x%prime),numbers) return primes 

EDIT: Thanks for all the answers! The conclusion from this is that it is a filter, which is a problem because it goes through each element (and not just those that should be marked as not simple), and because every time it creates a new list. I rewrote it with a good old for loops and one round of filtering, and it works much faster. New code:

 def generatePrimes(upperBound): numbers = range(2,upperBound+1) for i in xrange(len(numbers)): if(numbers[i] != 0): for j in xrange(i+numbers[i],len(numbers),numbers[i]): numbers[j] = 0 primes = filter(lambda x: x,numbers) return primes 
+6
source share
2 answers

The sieve of Eratosthenes looks like this:

 def sieve(n): primality_flags = [True]*(n+1) flags[0] = flags[1] = False primes = [] for i, flag in enumerate(primality_flags): if flag: primes.append(i) for j in xrange(2*i, n+1, i): flags[i] = False 

It processes each number once when the outer loop reaches it, and once for each number that divides it. About 1/2 the number is divided by 2, about 1/3 is divided by 3, etc .; asymptotically, the average number of times that each number is processed is 1 + the sum of the inverse numbers of primes to n. This sum is about log(log(n)) , so the sieve has an asymptotic time complexity of O(n*log(log(n))) , assuming that arithmetic is constant time. This is really good.


Your function does not do this. Your filter looks at each element in numbers , regardless of whether it is divisible by prime . Each element is processed for each prime number to the first number that divides it, and processing a prime p removes about 1 / p of the numbers elements. Let a sequence of primes be p [0], p [1], p [2], etc., and let a sequence of sizes numbers be n [0], n [1], n [2], etc. ., we have the following approximate recurrence:

 n[0] = upperBound - 1 n[1] = n[0] * (p[0]-1)/p[0] n[2] = n[1] * (p[1]-1)/p[1] ... n[k+1] = n[k] * (p[k]-1)/p[k] 

and your algorithm takes time roughly proportional to the sum of the values ​​of n until numbers is empty. I did not analyze the behavior of this series, but the calculation shows that growth is much worse than O(n*log(log(n))) .

+4
source

Running cProfile shows that most of the time is spent on the filter. Replacing the filter with a list comprehension speeds up the work by about 2 times.

 numbers = [n for n in numbers if n%prime != 0] 

But this does not really fix the main problem, which is that you recreate the list of numbers with each iteration, and this is slow. Faster implementations of http://groups.google.com/group/comp.lang.python/msg/f1f10ced88c68c2d just mark non-zero numbers, replacing them with 0 or similar.

+2
source

Source: https://habr.com/ru/post/988201/


All Articles