Local emulation for Azure + HDInsight

The task is to implement the T part (transformation) of the ETL project in the Azure cloud. I believe that HDInsight is the right service to use, but not sure. Please approve or deny this choice.

I am new to this area and would appreciate it if someone could point me in the right direction.

I would like to be able to develop a translation service (work) and test it locally using Azure Storage / Compute Emulators and Visual Studio 2012 (ideal in C #). I am very confident how HDInsight fits this picture (if at all). The conversion task will read text files from the blob repository and produce (reduce the map) data in the table repository in azure format.

+4
source share
2 answers

You can run the HDInsight field locally. This is separate from Azure storage and computer emulation, and is installed through the Web Platform installer (just search for HDInsight).

There are some subtle differences between the local and azure versions, since the local version works with data stored in HDFS, while in the cloud you can use Azure Blob Containers. As for the development and testing of conversion processes (in MapReduce / Hive / Pig), it does not matter. The only difference is how you could receive and enter data.

Please note that you can certainly create MapReduce jobs from C # to HDInsight, for basic data transformations it can be much easier to use a higher level language such as Pig or, possibly, HiveQL based on HDInsight.

+1
source

You need to draw a line at what level of T-transformation and automation you expect from this.

I suggest you run a direct console application that pulls data from blob and executes Transform

Reasons to suggest a console application approach

  • simple, direct, same skill set
  • A good SDK for blob and tables to do what you want.
  • Map-Reduce (HDInsight) is a completely new view in Azure Storage and C # family. I heard that HDInsight is good, but not sure if it is good enough for you here.
  • If you have a console application, you can easily schedule it, leave it running based on the Pub-Sub model
  • If you use the native C # application - console or .exe, you can easily configure it to be the working Azure.
  • Taking your own approach to the application will remove the installation above your head and configure your HDInsight
  • The cost of a wise employee role is cheaper than HDInsight
0
source

Source: https://habr.com/ru/post/1498457/


All Articles