I need to count a bunch of βthingsβ in Kassandra. I need to increment the counts ~ 100-200 every few seconds or so.
However, I need to count different "things."
In order not to read something twice, I set the key in CF, which the program reads before increasing the counter, for example. sort of:
result = get cf[key]; if (result == NULL){ set cf[key][x] = 1; incr counter_cf[key][x]; }
However, this read operation significantly slows down the operation of the cluster. I tried to reduce reading using multiple columns, for example. sort of:
result = get cf[key]; if (result[key1]){ set cf[key1][x] = 1; incr counter_cf[key1][x]; } if (result[key2]){ set cf[key2][x] = 1; incr counter_cf[key2][x]; } //etc....
Then I reduced the number of reads from 200+ to about 5-6, but it still slows down the cluster.
I donβt need an exact calculation, but I canβt use bit masks, nor color filters, because there will be 1M +++ counters, and some can move more than 4,000,000,000.
I am aware of Hyper_Log_Log counting, but I see no easy way to use it with so many counters (1M +++).
I'm currently thinking of using Tokyo Cabinet as an external key / value store, but this solution, if it works, will not be as scalable as Cassandra.