I am working on a Cassandra data model for storing time series (I am new to Cassandra). I have two applications: intraday stock data and sensor data.
Stock data will be saved with a temporary resolution of one minute. Seven data fields build one timeframe: Symbol, Datetime, Open, High, Low, Close, Volume
I will request data mainly by characters and date. for example give me all the data for AAPL between 2013-01-01 and 2013-01-31 ordered by Datetime. The recommendation for cassandra queries is to query entire columns. Thus, you can create five lines using the keys Open, High, Low, Close, Volume. And for each character and minute its own column. For instance. "AAPL: 2013-01-04T130400Z". This will result in a table of five rows and n * NT columns, where n = number of characters, nT = number of minutes. In most cases, I will query for date ranges. That is, all minutes a day. So I could rearrange the data to have columns named "AAPL: 2013-01-04" and the rows: OpenT130400Z, HighT130400Z, LowT130400Z, CloseT130400Z, VolumeT130400Z. This will result in a table with n * nD columns (n: number of characters, nD:number of days) and 5 * nM lines (nM: number of minutes / records per day).
To summarize: I have columns that store all-day information for one character.
I found a description of how to process time series data in cassandra here http://www.datastax.com/dev/blog/advanced-time-series-with-cassandra
But I really don't get it if they use the hour (1332960000) as column name or as a key key !? I realized that they use the hour as the row key and have small timestamps in the form of columns. This way they will have a fixed column number. But this will have flaws in reading, because I will have to do a range request for the keys! I'm right?
:
, , 1 (, ), ?
, , 3 600 000 000 n * nH (n: , nH: ).
, 3,6 , 2 .
?
? ?
!
,
Malte