What is the best way to store incoming data streams?

What is the best choice for long-term storage (many records, several reads) of data processed through Spark Streaming: Parquet, HBase or Cassandra? Or something different? What are the tradeoffs?

+4
source share
1 answer

In my experience, we used Hbase as the data store for spark streaming data (we also have the same scenario as many records and several reads), since we use hasoop, hbase has built-in integration with hadoop, and everything went well.

  • Above, we used tostore hight-speed messages coming from consolation.

  • HBase . ...

  • , , rawdata hdfs ( + avro) (SaveMode.Append)), rawdata

Ex hdf: completion ofbusinessdate/environment/businesssubtype/message type ..... Hbase .

, , , repartion(1) colelese FileUtils.copymerge . , .

Corm , . enter image description here

  • ( ).

  • ( , ).

  • ( - )

Casandra AP.

Hbase CP.

+1

Source: https://habr.com/ru/post/1660556/


All Articles