I think CouchDB is a great database, but the ability to handle big data is dubious. CouchDB focuses on ease of development and offline replication, not necessarily performance or scalability. CouchDB itself does not support splitting, so you will be limited to the maximum node size unless you use BigCouch or invent your own splitting scheme.
No foolin, Redis is a database in memory. This is very fast and efficient when retrieving data from and from RAM. He has the ability to use the drive for storage, but this is not very good. This is great for a limited amount of data that changes frequently. Redis has replication, but no built-in support for partitioning, so you'll be here again.
You also mentioned Cassandra, which I think is more aimed at your use case. Cassandra is well suited for databases that grow endlessly, in fact, this is an original use case. Separation and availability are baked, so you don't have to worry about it. The data model is also a little more flexible than the average key / value store, adding a second column size and can practically hold millions of columns per row. This allows, for example, to "row" the time series data into rows that span time ranges, for example. Cluster data distribution (partitioning) is performed at the row level, so only one node is required to perform operations within a row.
Hadoop connects directly to Cassandra with "native drivers" for MapReduce, Pig, and Hive, so it can potentially be used to aggregate collected data and materialize current averages. It is best practice to form data around queries, so it is probably necessary to save multiple copies of the data in a “denormalized” form, one for each type of query.
Send this message to complete the time series in Kassandra:
http://rubyscale.com/2011/basic-time-series-with-cassandra/
source share