The number of rows in a column in Kassandra

Question

The number of rows in a column in Kassandra

Is there a way to get the row counter (number of keys) of one column family in Kassandra? get_count can only be used to count columns.

For example, if I have a column family containing users and you want to get the number of users. How can i do this? Each user is his own line.

+44

database count cassandra

Henri Liljeroos Dec 23 '09 at 10:06

source share

6 answers

Justin DeMaris · Answer 1 · 2013-01-21 21:04

If you work with a large dataset and maintain a good approximation, I highly recommend using the command:

nodetool --host <hostname> cfstats

This will list for each column family, which looks like this:

 Column Family: widgets SSTable count: 11 Space used (live): 4295810363 Space used (total): 4295810363 Number of Keys (estimate): 9709824 Memtable Columns Count: 99008 Memtable Data Size: 150297312 Memtable Switch Count: 434 Read Count: 9716802 Read Latency: 0.036 ms. Write Count: 9716806 Write Latency: 0.024 ms. Pending Tasks: 0 Bloom Filter False Postives: 10428 Bloom Filter False Ratio: 1.00000 Bloom Filter Space Used: 18216448 Compacted row minimum size: 771 Compacted row maximum size: 263210 Compacted row mean size: 1634

The line "Number of keys (score)" is a good guess in the cluster, and performance is much faster than explicit calculations.

jbellis · Answer 2 · 2009-12-23 15:05

If you use an order-keeping custodian, you can do this with get_range_slice or get_key_range.

If this is not the case, you will need to store the user IDs in a special line.

ajjain · Answer 3 · 2013-05-28 11:41

I found a great article about this here. http://www.planetcassandra.org/blog/post/counting-keys-in-cassandra

select count (*) from cf limit 1000000

The above operator can be used if we have an approximate upper bound known in advance. I found this useful for my case.

Ben Burns · Answer 4 · 2010-08-29 21:36

[Edit: this answer is deprecated, like Cassandra 0.8.1 - see Writing counters in the Cassandra Wiki for the correct way to handle column columns in Cassandra.]

I am new to Cassandra, but I have mixed up a lot with the Google App Engine. If no other solution is presented, you might consider keeping a separate counter on a platform that supports atomic increment operations, such as memcached. I know that Cassandra is working on Atom increment / Decment functionality, but it is not yet ready for prime time.

I can publish only one hyperlink because I am a beginner, so for promotion on counter support see the link in my comment below.

Note that this thread offers ZooKeeper, memcached, and redis as possible solutions. My personal preferences would be memcached.

http://www.mail-archive.com/user@cassandra.apache.org/msg03965.html

Dean Hiller · Answer 5 · 2011-11-14 23:51

There is always a map / abbreviation, but that probably goes without saying. If you have this with a hive or pigs, you can do this for any table in the cluster, although I'm not sure tasktrackers know the location of cassandra, and so you may need to transfer the entire table over the network so that you can track tasks on cassandra but the data they receive can be from another cassandra node :(. I would really like to hear if someone knows for sure.

NOTE. We create a map / abbreviation on cassandra, mainly because if we want to get the index later, we can display / reduce it in cassandra.

Philip Schlump · Answer 6 · 2009-12-23 14:41

I get such calculations after converting the data into a hash in PHP.

The number of rows in a column in Kassandra

More articles: