Search for superfast data storage for intersection operations

I used Redis for a while as a backend for Resque, and now that I am looking for a quick way to perform the intersection operation on large datasets, I decided to give Redis a shot.

I did the following testing:

- x , y and z are a set of Redis, they all contain approx. 1 million members (random integers taken from an array of seeds containing 3M + members).

- I want to cross xy and z , so I use sintersectstore (to avoid overheating caused by retrieving data from the server to the client)

sinterstore rxyz 

- the resulting set ( r ) contains about half a million members, Redis calculates this set in about half a second.

Half a second is not bad, but I will need to perform such calculations on sets that can contain more than a billion members.

I have not tested how Redis will react with such huge sets, but I believe that it will take a lot more time to process the data.

Am I doing it right? Is there a faster way to do this?

Notes:

- Own arrays are not an option, as I am looking for a distributed data warehouse, to which several employees will be available.

- I get these results on 8 3.4 GHz Mac cores with 16 GB of RAM, disk saving was disabled in the Redis configuration.

+4
source share
1 answer

I suspect bitmaps are your best hope.

In my experience, redis is a great server for bitmaps; you must use the string data structure (one of the five data structures available in redis)

many, or possibly all of the operations you need to perform, are available out of the box in redis, like atomic operations

operation redis setbit has a time complexity of O (1)

In a typical implementation, you need to hash the values ​​of your array to offset the values ​​in the bit string, then set each bit to its corresponding offset (or index); So:

 >>> r1.setbit('k1', 20, 1) 

the first argument is the key, the second is the offset (index value), and the third is the value in that index on the bitmap.

to find out if a bit is set at this offset (20), a getbit call passing in the key for the bit string.

 >>> r1.getbit('k1', 20) 

then on these bitmap images you can, of course, perform the usual bitwise operations, for example, logical AND, OR, XOR.

+4
source

Source: https://habr.com/ru/post/1398677/


All Articles