HDInsight: HBase or Azure Table Storage?

My team is currently creating a solution that will use HDInsight. We will receive 5 TB of data daily and you will need to make some cards / reduce jobs for this data. Will there be any performance / cost difference if our data is stored in Azure Table Storage instead of Azure HBase?

+5
source share
2 answers

The main differences are both functionality and cost.

There is no built-in map transfer mechanism in the Azure table storage, although of course you can use the map reduction approach to write your own.

You can use Azure HDInsight to connect Map Reduce to a table store. There are several connectors around, including one written by me, which focuses on the hive and requires some configuration, and may not match your partition layout ( http://www.simonellistonball.com/technology/hadoop-hive-inputformat-azure-tables / ) and less performance-oriented, but a more complete version from someone from Microsoft ( http://blogs.msdn.com/b/mostlytrue/archive/2014/04/04/analyzing-azure-table-storage-data -with-hdinsight.aspx ).

The main advantage of table storage is that you do not constantly receive processing costs.

If you use HBase, you will need to run a full cluster all the time, so there is a cost disadvantage, however you will get some functionality and performance boost, plus you will have something more portable if you want to use other hadoop platforms. You will also have access to a much wider range of analytic functions with the HBase option.

+7
source

HDInsight (HBase / Hadoop) uses Azure Blob storage, not ATS. For your data warehouse, you will be charged only the corresponding storage cost based on your subscription.

PS Remember to remove the cluster after shutdown to avoid fees. Your data will be stored in the BLOB repository and can be used in the next cluster that you create.

+2
source

Source: https://habr.com/ru/post/1205690/


All Articles