You have several options for the solution you are looking for.
The most powerful will be the use of Lucene indexes, integrated with Cassandra Stratio, which allows you to search on any indexed field on the server side. Your recording time will be increased, but on the other hand, you can request any time range. You can find more information about the Lucene indices in Cassandra here . This enhanced version of Cassandra is fully integrated into the deep spark project, so you can take full advantage of the Lucene indices in Kassandra. I would recommend you use Lucene indexes when you are doing a limited query that retrieves a small result set, if you are going to extract most of your data set, you should use the third parameter below.
Another approach, depending on how your application works, may be to trim the timestamp field so that you can search for it using the IN statement. The problem is, as far as I know, you cannot use the spark-cassandra connector for this, you must use the direct Cassandra driver, which is not integrated with Spark, or you can look at the project from a deep spark where a new feature will be released soon, allowing you to do this. Your query will look something like this:
select * from datastore.data where timestamp IN ('2013-01-01', '2013-01-02', '2013-01-03', '2013-01-04', ... , '2013-12-31')
but, as I said, I don’t know if it suits your needs, since you can’t trim your data and group them by date / time.
The last option, but less effective, is to deliver a complete set of data to your spark cluster and apply a filter on RDD.
Disclaimer: I work at Stratio :-) Feel free to contact us if you need help.
Hope this helps!