We use Hadoop / HBase, and I looked at Cassandra, and they usually use a row key as a means to get data faster, although of course (at least in HBase), you can still apply its filters by column data or by client side. For example, in HBase, you can say "give me all the lines starting from key1 to, but not including key2."
So, if you developed your keys correctly, you can get everything for 1 user or 1 host or 1 user on 1 host or something like that. But this requires a properly designed key. If most of your queries need to be run with a timestamp, you can include this as part of the key, for example.
How often do you need to request data / write data? If you plan to run your reports, and this is normal, if it takes 10, 15 or more minutes (maybe), but you do a lot of small recordings, then HBase w / Hadoop does MapReduce (or using Hive or Pig as a request for higher-level languages) will work very well.
source share