What is the best way to store incoming data streams?

Question

What is the best way to store incoming data streams?

What is the best choice for long-term storage (many records, several reads) of data processed through Spark Streaming: Parquet, HBase or Cassandra? Or something different? What are the tradeoffs?

+4

cassandra hbase apache-spark spark-streaming parquet

Druckerbg Nov 12 '16 at 4:25

source share

1 answer

Ram Ghadiyaram · Answer 1 · 2016-11-14T02:58:13+0000

In my experience, we used Hbase as the data store for spark streaming data (we also have the same scenario as many records and several reads), since we use hasoop, hbase has built-in integration with hadoop, and everything went well.

Above, we used tostore hight-speed messages coming from consolation.
HBase . ...
, , rawdata hdfs ( + avro) (SaveMode.Append)), rawdata

Ex hdf: completion ofbusinessdate/environment/businesssubtype/message type ..... Hbase .

, , , repartion(1) colelese FileUtils.copymerge . , .

Corm , .

( ).
( , ).
( - )

Casandra AP.

Hbase CP.

What is the best way to store incoming data streams?

More articles: