Fast flower filters in C-64-bit Ints, High Frequency Initialize / Query / Destroy cyle

I need a flowering filter implementation for part of a large project. The whole project is in C (and C only! No C ++), and, unfortunately, I could not find suitable implementations of the C-based flowering filter (other than proof of concept ).

My requirements for a flowering film:
   1. A module containing a flowering filter runs every 50 ms .
 The entire module should complete execution within 5-6 ms,
 which means that the entire flowering filter code must be completed in less than 3 ms .
   2. Elements - 64-bit integers
   3. I have only 8 thousand Elements (inserts / queries inclusive)
    The general case is several hundred filter inserts and 1000-1500 queries.

Every 50 ms, I get two sets of (W, R) 64-bit ints. I need to find the intersection between W and R obtained in this era (IOW, the flowering filter should start fresh for each era). The code below shows the overall control flow

sleep(50ms)
...module code..
clear(bloomfilter) /* basically a memset(0) on bloomfilter bitmap */
W = getListW()
for each entry in W
  insert(bloomfilter, entry)
R = getListR()
for each entry in R
   if (present(bloomfilter, entry))
      ..do something with entry..
..rest of module code..

, , . . ( W) . - - . -, SHA1 - .

+3
3

. , 64- ints ( 32- 64- ). , - 64K . , 16- "" 64- int, xoring 16- , . , , , .

, - . , 8K. ( ). , , , , , - 0.

uint64_t table[65536] = {0};

void clear()
{
    memset(table, 0, sizeof(table));
}

uint16_t hash(uint64_t val)
{
    assert(ele != 0);
    uint16_t *parts = (uint16_t*)&ele;
    uint16_t h = 0x5AA5;
    h = h * 131 + parts[0];
    h = h * 131 + parts[1];
    h = h * 131 + parts[2];
    h = h * 131 + parts[3];
    return h;
}

void insert(uint64_t ele)
{
    uint16_t h = hash(ele);
    while (table[h])
        ++h;
    table[h] = ele;
}

int find(uint64_t ele) 
{
    int res = 0;
    uint16_t h = hash(ele);
    while (table[h] != ele)
    {
        if (!table[h])
            return 0;
        ++h;
    }
    return 1;
}

, . , -.

+3

:

  • N.
  • -, .

~ 1000 , , , , 1 8, . . , , set1 = { e1 } set2 = { e2 }, e1 != e2, set1 intersect set2 = { }, bf(set1) interesect bf(set2) <> {}. : - bf(set1) intersect bf(set2) = {}, set1 intersect set2 = {}.

, BF R W, , 2 .

, C:

const unsigned N = 1024 * 8;
const unsigned BPW = 8 * sizeof ulong;
typedef unsigned long ulong;
typedef struct BF { ulong bits[N/BPW]; } BF;

unsigned hash(ulong e) { return foo(e) % N; }
void clear(BF* pbf) { memset(pbf->bits, 0, sizeof(pbf->bits)); }
void add(BF* pbf, ulong e) { unsigned h = hash(e); bf.bits[h/BPW] |= 1 << (h%BPW); }
bool hit(BF* pbf, ulong e) { unsigned h = hash(e); return (bf.bits[h/BPW]>>(h%BPW)) & 1; }
bool intersect(BF* pbfResult, BF* pbf1, BF* pbf2) {
    bool empty = TRUE;
    for (unsigned i = 0; i < N/BPW; i++)
        if ((pbfResult->bits[i] = pbf1->bits[i] & pbf2->bits[i]) != 0)
            empty = FALSE;
    return !empty;
}
void intersectRW(unsigned nr, ulong* r, unsigned nw, ulong* w) {
    BF bfR, bfW, bfIntesection;
    unsigned i;

    clear(&bfR);
    for (i = 0; i < nr; i++)
         add(&bfR, r[i]);

    // variant 1: enumerate elements of W that hit in BF(R)
    for (i = 0; i < nw; i++)
         if (hit(&bfR, w[i]))
             ... w[i] ...

    // variant 2: determine if intersection of BFs is empty and get intersection BF
    clear(&bfW);
    for (i = 0; i < nw; i++)
         add(&bfW, w[i]);
    bool any = intersect(&bfIntersection, &bfR, &bfW);
    ...
}

?

  • 3 BF 1 KB, . 128 ulongs, TOS L1 $, , ;
  • 100-1000 bfR, . ~ 1000 add, ;
  • 100-1000 bfR, . ~ 1000 , , , ;
  • 2, AND ~ 128 ulongs

( , , /% .)

- L1 L2; 2 , , .

-, 64- . , 64- 16 , xors .

* - MS V++ 4.0 " " (http://msdn.microsoft.com/en-us/library/kfz8ad09(VS.80).aspx) - . , , ... *

?

!

, :

  • Overkill, SIMD (, SSE).
  • , . , - R W, , , , .
  • . , , , ( add() s intersect().)
  • , , R W , , BF R W, (OR) BF (R) s BF (W) s .
+2

3 .

Is your processor fast enough to keep this simple and sort both lists? The variety should be fast, as everything will be convenient in the cache. Going through two lists to find the intersection is pretty quick, and you never have to worry about dealing with false positives, like with a Bloom filter.

0
source

Source: https://habr.com/ru/post/1778427/


All Articles