Saving variables as separate bits

over the past few weeks, I have been working on a program that can make the most efficient spirals possible. I was looking for multithreading to increase the speed of the program, and now I have a new problem. My list of prime numbers has a length of 64 million zeros and ones, and this list takes up 240 MB of RAM. because I use multiprocessing (only 5 processes), my script uses a total of 1.1 GB of bar, and if it reaches this point, it returns a memory error.

Some background information on how I store primes: primes are stored in a list, and every time I find a stroke, I set the value to 1 (example: Primes [13] = 1 (because it's prime), and Primes [14] = 0). For me, this seemed like the best solution since the list would not use a lot of memory

After some basic math, I came to the conclusion that every zero or one in the list of my primes occupies 4 bytes (32 bits) of information. This seems logical, but I was wondering if there is a way to store zero and ones as separate bits, so it won't take up as much memory.

Thanks in advance for any answers, Regards, harm

+5
source share
2 answers

If every 0 or 1 takes 32 bits, that means its array of characters (maybe an integer?). Instead, you should use a boolean type. The easiest way to do this is to use bitarray. One of the implementations:

https://pypi.python.org/pypi/bitarray/0.8.1

+3
source

Here's a Python 2 code that creates a file of primes packed in bytes, with each byte encoding a primitive block of 30 numbers, using the fact that all primes> 5 are coprime to 30 and therefore correspond to one of (1, 7, 11, 13, 17, 19, 23, 29) mod 30.

sieve_bin.py

 #! /usr/bin/env python ''' Prime sieve. Save primes to a file with the primes in each block of 30 numbers packed into a byte, utilising the fact that all primes > 5 are coprime to 30 and hence are congruent to one of (1, 7, 11, 13, 17, 19, 23, 29) mod 30 Written by PM 2Ring 2016.02.06 Prime sieve by Robert William Hanks See http://stackoverflow.com/q/35222244/4014959 ''' import sys from array import array def rwh_primes(n): ''' Returns a boolean list of odd primes < n Adapted from code by Robert William Hanks See http://stackoverflow.com/a/3035188/4014959 ''' #Each `sieve` item represents an odd number, starting at 1 sieve = [True] * (n//2) for i in xrange(3, int(n**0.5) + 1, 2): if sieve[i//2]: sieve[i*i//2::i] = [False] * ((n - i*i - 1) // (2*i) + 1) return sieve def main(): if len(sys.argv) != 2: print '''Generate a file of primes packed into bits. Usage: %s hi to generate a file of primes < `hi` If `hi` isn't a multiple of 30 it will be rounded down to the nearest multiple of 30 ''' % sys.argv[0] exit() hi = int(sys.argv[1]) // 30 * 30 fname = 'primebits' print 'Generating primes less than %d...' % hi odd_primes = rwh_primes(hi) print 'Packing primes into bytes...' prime_residues = (1, 7, 11, 13, 17, 19, 23, 29) bitmasks = [(1<<j, (u - 1) // 2) for j, u in enumerate(prime_residues)] packer = (sum(mask for mask, r in bitmasks if odd_primes[i + r]) for i in xrange(0, hi//2, 15)) primebytes = array('B', packer) with open(fname, 'wb') as f: primebytes.tofile(f) print 'Saved to', fname if __name__ == '__main__': main() 

And here is the program that reads the primebits file created by sieve_bin.py . The file is read in an array of unsigned bytes, so it is efficient in using RAM: one byte for each data byte read from the file, plus a little overhead for the array object itself (28 bytes on a 32-bit machine).

The main function of this program creates a list of primes using the isprime function. This is about 4 times slower than using a sieve, but it has much less memory overhead. You can probably speed it up a bit, but such optimizations will be left as an exercise for the reader. :)

isprime_bin.py

 #! /usr/bin/env python ''' Test if a number is prime, using a table read from disk The table contains the primes packed into bytes, with each byte encoding the primality of a block of 30 numbers, utilising the fact that all primes > 5 are coprime to 30 and hence are congruent to one of (1, 7, 11, 13, 17, 19, 23, 29) mod 30 See http://stackoverflow.com/q/35222244/4014959 ''' import sys import os.path from array import array def load_primes(fname): primebytes = array('B') filesize = os.path.getsize(fname) with open(fname, 'rb') as f: primebytes.fromfile(f, filesize) return primebytes prime_residues = (1, 7, 11, 13, 17, 19, 23, 29) #Build a dict for fast conversion of residue to bitmask bitmasks = dict((v, 1<<i) for i, v in enumerate(prime_residues)) def isprime(n, primebytes): if n < 2: return False if n in (2, 3, 5): return True r = n % 30 if r not in bitmasks: return False b = primebytes[n // 30] return b & bitmasks[r] #Test def main(): m = int(sys.argv[1]) if len(sys.argv) > 1 else 300 fname = 'primebits' primebytes = load_primes(fname) primes = [i for i in xrange(m) if isprime(i, primebytes)] print primes if __name__ == '__main__': main() 

On my old single-core 2 GHz computer with 2 GB of RAM, it takes about 26 seconds for sieve_bin.py to create a primebits file for numbers <64000020 (file size = 2133334 bytes); approximately half of this time is spent sifting prime numbers. isprime_bin.py takes about 124 seconds to create a list of primes from this file (sending the output to / dev / null).

The result was verified by comparing it with the output generated by the regular Eratosthenes sieve program.


The code in isprime_bin.py designed to test arbitrary positive integers for primitiveness. To simply generate a list of primes, it can be significantly accelerated, because the first two if tests are applicable only to numbers <= 5. Here is a modified version that takes about 47 seconds to create a list of all primes from 2133334 bytes primebits .

 #! /usr/bin/env python import sys import os.path from array import array def load_primes(fname): primebytes = array('B') filesize = os.path.getsize(fname) with open(fname, 'rb') as f: primebytes.fromfile(f, filesize) return primebytes prime_residues = (1, 7, 11, 13, 17, 19, 23, 29) #Build a dict for fast conversion of residue to bitmask bitmasks = dict((v, 1<<i) for i, v in enumerate(prime_residues)) def isprime(n, primebytes): r = n % 30 if r not in bitmasks: return False b = primebytes[n // 30] return b & bitmasks[r] def primes(primebytes): s = (2, 3, 5) + prime_residues[1:] for i in s: yield i s = prime_residues j = 30 while True: try: for i in s: p = j + i if isprime(p, primebytes): yield p j += 30 except IndexError: return #Test def main(): m = int(sys.argv[1]) if len(sys.argv) > 1 else 300 fname = 'primebits' primebytes = load_primes(fname) primelist = [] for p in primes(primebytes): if p > m: break primelist.append(p) print primelist if __name__ == '__main__': main() 
0
source

Source: https://habr.com/ru/post/1242366/


All Articles