You can set the spark in hdinsight cluster. You must do this by creating a custom cluster and add a script action that will install Spark in the cluster at the time the VM is created for the cluster.
Installing using a script action in installing a cluster is pretty easy, you can do it in C # or powershell by adding a few lines of code for a standard custom cluster script / program.
PowerShell:
# ADD SCRIPT ACTION TO CLUSTER CONFIGURATION $config = Add-AzureHDInsightScriptAction -Config $config -Name "Install Spark" -ClusterRoleCollection HeadNode -Urin https://hdiconfigactions.blob.core.windows.net/sparkconfigactionv02/spark-installer-v02.ps1
WITH#:
// ADD THE SCRIPT ACTION TO INSTALL SPARK clusterInfo.ConfigActions.Add(new ScriptAction( "Install Spark", // Name of the config action new ClusterNodeType[] { ClusterNodeType.HeadNode }, // List of nodes to install Spark on new Uri("https://hdiconfigactions.blob.core.windows.net/sparkconfigactionv02/spark-installer-v02.ps1"), // Location of the script to install Spark null //because the script used does not require any parameters. ));
you can then run RDP in the headnode and run the spark shell or use spark-submit to run the jobs. I'm not sure how to fire a spark, not rdp in headnode, but that is another question.
source share