Oversizing lists in python

Question

Oversizing lists in python

I am trying to implement an eratosthenes sieve in python, however, trying to find all primes up to the sqare root, for example 779695003923747564589111193840021 I get an error when the result of range () is also many items. My question is: how can I avoid this problem, if I create an instance of a list with a while loop, I will get an error saying that I use too much memory (before it even starts using the page file), these two are listed below:

Using range ()

maxnum = 39312312323123123 primes = [] seq = [] i = 0 seq = range(2,maxnum) for i in seq: mul = i * seq for j in mul: try: seq.remove(j) except: pass primes.append(i) print primes

Using while:

 maxnum = 39312312323123123 primes = [] seq = [] i = 0 while i < maxnum: seq.append(i) i+=1 for i in seq: mul = i * seq for j in mul: try: seq.remove(j) except: pass primes.append(i) print primes

+4

python memory sieve-of-eratosthenes prime-factoring

Kristoffer sall-storgaard Feb 25 '10 at 21:21

source share

7 answers

I would say, “use xrange() instead,” but you are actually using the list of ints as the result of a sieve ..... Thus, an integer generator is not the right solution.

I think it will be difficult to implement a list with elements 39312312323123123, no matter what function you use for this. That is 279 petabytes from 64-bit integers.

Try it.

 class FoundComposite(Exception): pass primes = [2] seq = itertools.takewhile( # Take integers from a list lambda x: x<MAXNUM, # until we reach MAXNUM itertools.count(2) # the list of integers starting from 2 ) #seq = xrange(2, MAXNUM) # alternatively for i in seq: try: for divisor in primes: if not (i % divisor): # no remainder - thus an even divisor # continue to next i in seq raise FoundComposite # if this is reached, we have tried all divisors. primes.append(i) except FoundComposite: pass

+6

Joe koberg Feb 25 '10 at 21:22

source share

Your algorithm is broken. First run it for maxnum = 100 to work.

Once you earn it, you will find that maxnum = 100000000 will take a long time.

Schedule the time it takes to start maxnum in (10,100,1000,10000,100000,1000000 ...), you can extrapolate how long it takes 39312312323123123 :)

+2

John la rooy Feb 25 '10 at 21:29

source share

There is a third-party module for python named gmpy

It has several features that may be useful to you, as they are very fast. Probabilistic material starts around the 4 billion mark.

 next_prime(...) next_prime(x): returns the smallest prime number > x. Note that GMP may use a probabilistic definition of 'prime', and also that if x<0 GMP considers x 'prime' iff -x is prime; gmpy reflects these GMP design choices. x must be an mpz, or else gets coerced to one. is_prime(...) is_prime(x,n=25): returns 2 if x is _certainly_ prime, 1 if x is _probably_ prime (probability > 1 - 1/2**n), 0 if x is composite. If x<0, GMP considers x 'prime' iff -x is prime; gmpy reflects this GMP design choice. x must be an mpz, or else gets coerced to one.

+1

John la rooy Feb 25 '10 at 21:41

source share

range() returns a list containing all the numbers in the requested range, while xrange is a generator and gives numbers one by one with memory consumption close to zero.

0

Alexander Gessler Feb 25 '10 at 21:27

source share

About the memory limit, how about creating a custom list (class) that is internally a linked list of lists or arrays. Moving magically from one to another internally and adding more as needed, as the caller uses your own list with the external interface that you provided, which will look like those members that are needed to facilitate .append.remove, etc. arrays used in your problem.

Note. I am not a Python programmer. I don’t know how to implement what I said in Python. Maybe I don’t know the context here, so I will understand if they reject me.

Perhaps use generators "as they call in python to get the results of your internal lists, as if it were one huge single list, Maybe with a linked list .

0

John k Feb 25 '10 at 21:40

source share

Try the following:

 def getPrimes(maxnum): primes = [] for i in xrange(2, maxnum): is_mul = False for j in primes: # Try dividing by all previous primes if i % j == 0: is_mul = True # Once we find a prime that i is divisible by break # short circuit so we don't have to try all of them if not is_mul: # if we try every prime we've seen so far and `i` primes.append(i) # isn't a multiple, so it must be prime return primes

You should not run out of memory until you reach a very large number of primes. This way you don't have to worry about creating a list of multiples. Not sure if this is still considered a sieve.

Actually, this will not work for maxnum = 39312312323123123 . Using the type number theorem , we can estimate that there will be approximately 1,028,840,332,567,181 primes in this range.

As pointed out in this question, the maximum size of a python list on a 32-bit system is 536,870,912 . Therefore, even if you do not have enough memory, you still cannot complete the calculation.

You should not have this problem with a 64-bit system.

2 ** 64 => 18446744073709551616

Using the math from the above question (2 ** 64) / 8 , the maximum number of items in the list will be 2,305,843,009,213,693,951 , which is more than the estimated number of primes you will encounter.

Edit:

To avoid memory problems, you can save a list of primes in a file on your hard drive. Keep one stroke in a line and read the file every time you check a new number.

Maybe something like this:

 primes_path = r'C:\temp\primes.txt' def genPrimes(): for line in open(primes_path, 'r'): yield int(line.strip()) def addPrime(prime): primes_file = open(primes_path, 'a') primes_file.write('%s\n' % prime) primes_file.close() def findPrimes(maxnum): for i in xrange(2, maxnum): is_mul = False for prime in genPrimes(): # generate the primes from a file on disk if i % prime == 0: is_mul = True break if not is_mul: addPrime(i) # append the new prime to the end of your primes file

At the end, you will have a file on your hard drive containing all your primes.

Okay, so it will be pretty slow, but you will not run out of memory. You can do this faster by increasing the speed with which you can read / write files (e.g. RAID ).

0

tgray Feb 25 '10 at 21:56

source share

Steve314 · Accepted Answer · 2010-02-25T21:47:46+0000

This is a more complex algorithm, possibly from a technical point of view, not considered a sieve, but one approach is to not delete all the multiples of the given stroke at once, but the queues of the next multiple (along with the simple one). This can be used in the implementation of the generator. The queue will still contain many (multiple) primes, but not as many as by creating, then filtering the list.

The first few steps are done manually to show the principle ...

2 - this is the primary income and turn (4, 2)
3 - this is the primary income and turn (6, 3)
4 is composite - replace (4, 2) with (6, 2) in line
5 - primary income and turn (10, 5)
6 is composite - replace (6, 2) with (8, 2) and (6, 3) with (9, 3)

Note. The queue is not a FIFO. You will always retrieve tuples with the smallest first element, but new / replacement tuples do not have (usually) the highest first element and (as with 6) there will be duplicates.

To efficiently handle a queue in Python, I suggest a dictionary (i.e. a hash table) introduced by the first element of the tuple. The data is a set of values of the second element (initial primes).

As suggested elsewhere, do a test with small goals before trying to use a large one. And do not be surprised if you fail. It may still be that you need too many large integers allocated in a bunch at a time (in the queue) to complete the solution.

Oversizing lists in python

More articles: