Apache Spark Workflow

Question

Apache Spark Workflow

How to organize Spark development workflow?

My way:

Local service "swap / yarn".
Local spark service.
Intellij on one screen
Terminal running sbt console
After I changed the code of the Spark application, I switched to the terminal and ran the “package” to compile into jar and “submitSpark”, which is the stb task that runs spark-submit
Wait for an exception in the sbt console :)

I also tried to work with a spark shell:

Launch the shell and download the previously written application.
Write string in shell
Rate it
If it's a great copy in the IDE
After a few 2,3,4, paste the code into the IDE, compile the spark application and start again.

Is there a way to develop Spark apps faster?

+6

workflow apache-spark

zie1ony Jun 03 '15 at 12:40

source share

3 answers

I found the writing scripts and used: load /: to copy ordered things, since I didn't need to pack anything. If you use sbt, I suggest you run it and use ~ package so that it automatically packs the jar when changes are made. Ultimately, of course, it will all end in the application bank, this is for prototyping and learning.

Local spark
Vim
spark shell
API
Console

+1

Chris Jun 03 '15 at 13:32

source share

We develop our applications using the IDE (Intellij because we code your spark applications in Scala) using scalaTest for testing.

In these tests, we use local [*] like SparkMaster to enable debugging.

To test the integration, we used Jenkins, and we run the end-to-end script as a Scala application.

I hope this will be helpful

+1

jlopezmat Jun 08 '15 at 15:14

source share

maasg · Accepted Answer · 2015-06-12T16:58:53+0000

I am developing the core logic of our Spark jobs using an interactive environment for rapid prototyping. We use Spark Notebook , working with the development cluster for this purpose.

As soon as I prototype the logic and work as expected, I “industrialize” the code in the Scala project with the classic build life cycle: create tests; Build, pack, and create Jenkins artifacts.

Apache Spark Workflow

More articles: