Python space + efficient temporary data structure for storing two-dimensional bitmaps

I want to create a two-dimensional binary (bit) array in Python in effective space and time, since my 2D bitarray will be about 1 million (rows) * 50,000 (columns 0 or 1), and I will also perform bitwise operations on these huge elements . My array will look something like this:

0 1 0 1 1 1 1 0 1 0 0 0 ... 

In C ++, the most efficient way (space) for me would be to create a kind of array of integers, where each element is 32 bits, and then I can use shift operators in combination with bitwise operators to transfer operations.

Now I know that in python there is a bitarray module. But I cannot create a 2D structure using a list of bit arrays. How can i do this?

Another way I know in C ++ would be to create a map similar to map<id, vector<int> > , where I can manipulate a vector, as I mentioned above. Should I use the dictionary equivalent in python?

Even if you offer me some way to use a bitmap for this task, it will be great. If I can find out if I can have multiple streams for splicing a bitarray so that I can make it multithreaded. Thanks for the help!

EDIT:

I can even start creating my own data structure for this, if necessary. However, I just wanted to check before inventing the wheel.

+6
source share
3 answers

According to my comment you can use sets

 0 1 0 1 1 1 1 0 1 0 0 0 

can be represented as

 set([(1,0), (3,0), (0,1), (1,1), (2, 1), (0,2)]) 

or

 {(1,0), (3,0), (0,1), (1,1), (2, 1), (0,2)} 

a AND is equivalent to the intersection of 2 sets OR is the union of two sets

+5
source

How about the following:

 In [11]: from bitarray import bitarray In [12]: arr = [bitarray(50) for i in xrange(10)] 

This creates an array of 10x50 bits in size, which can be accessed as follows:

 In [15]: arr[0][1] = True In [16]: arr[0][1] Out[16]: True 

Keep in mind that a 1Mx50K array will require about 6 GB of memory (and a 64-bit version of Python for a 64-bit OS).

can i have multiple threads on a bitarray splice so i can make it multithreaded

This should not be a problem, with the usual reservations. Keep in mind that because of the GIL, you are unlikely to achieve performance improvements due to multithreading.

+4
source

Can you use numpy?

 >>> import numpy >>> A = numpy.zeros((50000, 1000000), dtype=bool) 

EDIT: This doesn't seem to be the most efficient space. Uses 50 GB (1 byte per bool). Does anyone know if numpy can use packaged bools?

+2
source

Source: https://habr.com/ru/post/914615/


All Articles