Salt-Based Spark Cluster Quick Start Guide

Question

Salt-Based Spark Cluster Quick Start Guide

I tried to ask for it in the official Salt user forum, but for some reason I did not get any help there. I hope to get help here.

I am a new Salt user. I still rate the framework as a candidate for our SCM tool (as opposed to Ansible).

I went through the tutorial, and I can successfully manage the master minion / s relationship, as described in the first half of the tutorial.

The textbook now expands into many different, complex areas.

I need relatively straight forward, so I hope that maybe someone can help me here on how to do this.

I am looking to install Spark and HDFS on 20 RHEL 7 machines (say, in the ranges 168.192.10.0-20, 0 is the name of the node).

I see:

https://github.com/saltstack-formulas/hadoop-formula

and I found a third-party spark formula:

https://github.com/beauzeaux/spark-formula

Could someone be kind enough to offer a set of instructions on how to make this installation the easiest way?

+5

hdfs apache-spark salt-stack

Edmon Dec 27 '15 at 4:49

source share

1 answer

helmbert · Accepted Answer · 2015-12-27T16:13:18+0000

Disclaimer: This answer describes only the rough process of what you need to do. I drove it from the relevant sections of the documentation and added sources for reference. I assume that you are familiar with the basic works of Salt (fortunes and pillars and much more), as well as with Hadoop (I do not).

1. Configure GitFS

A typical way to install salt formulas is to use GitFS. See the appropriate chapter in the Salt manual for in-depth documentation.

This must be done for your node salt wizard .

Include GitFS in the main configuration file (usually /etc/salt/master or a separate file in /etc/salt/master.d ):
```
 fileserver_backend: - git 
```

Add the two Solta formulas that you need as remotes (same file). This is also described in the documentation :

 gitfs_remotes: - https://github.com/saltstack-formulas/hadoop-formula.git - https://github.com/beauzeaux/spark-formula

(optional): Pay attention to the following warning format documentation :
We highly recommend using the formula store in your own GitHub account to avoid unforeseen changes in your infrastructure.
Many salt formulas are highly active repositories, so carefully post new changes. In addition, any additions you make to your plug can be easily sent back upstream with a quick pull request!
Run the formulas in your own Git repository (using GitHub or otherwise) and use your private Git URL as remote to prevent unexpected changes to your configuration.
Restart the salt master.

2. Install Hadoop

This is described in detail in the Formulas README file . From a cursory reading, the formula can establish both Hadoop masters and slaves; role is determined using salt grains.

Configure the Hadoop role in the /etc/salt/grains file. This needs to be done for each minion of salts node (use hadoop_master and hadoop_slave respectively):
```
 roles: - hadoop_master 
```
Set up the salt mine additional configuration grains and set them as you wish.
Add the necessary column data to configure your Hadoop. We returned to the Salt Wizard node for this (for this I assume that you are familiar with the states and pillars, see or this walkthrough otherwise). Take a look at a sample column for possible customization options.

Use hadoop and hadoop.hdfs states in top.sls :

 'your-hadoop-hostname*': - hadoop - hadoop.hdfs

3. Install Spark

According to the README Formula, nothing needs to be configured through grains or columns, so all that remains is to use the spark state in top.sls :
```
 'your-hadoop-hostname*': - hadoop - hadoop.hdfs - spark 
```

4. Fire!

Apply all states:

 salt 'your-hadoop-hostname*' state.highstate

Salt-Based Spark Cluster Quick Start Guide

1. Configure GitFS

2. Install Hadoop

3. Install Spark

4. Fire!

More articles: