I am working on a big data solution for sensor data and predictive analytics. I am new to Big Data and read about lambda architecture. I was thinking about using Cassandra Database with Hadoop. Cassandra is a highly accessible database of permissions and partitions, and Hadoop hdfs is a file system for large analytics tasks.
If I get data from the Things Device online store, should the data be saved first in Hadoop and then in Cassandra? Lambda architecture has Hadoop in the batch layer, receiving data and sending it to the serving layer in the nosql database.
Why should data be the first in Hadoop? and what data is stored in Cassandra if Hadoop contains raw data?
The stream layer is currently out of focus. I just want to understand the use of Cassandra and Hadoop together.
The data in Hadoop for big analytics and in cassandra should be the result of my Hadoop assignments.
Does this mean that I can store my raw data in both? can I store my raw data in Cassandra and Hadoop, if not only large analytical tasks are useful for my application?
Example
INSERT INTO temperature(weatherstation_id,event_time,temperature)
VALUES (’1234ABCD’,’2013-04-03 07:02:00′,’73F’);
if this is my insert, and I have thousands of them in one minute. I want to do some great jobs that I use Hadoop?
But also I need every row of data for my application without analytics. Does Cassandra keep him too?
source
share