Apache Spark Workflow

How to organize Spark development workflow?

My way:

  • Local service "swap / yarn".
  • Local spark service.
  • Intellij on one screen
  • Terminal running sbt console
  • After I changed the code of the Spark application, I switched to the terminal and ran the “package” to compile into jar and “submitSpark”, which is the stb task that runs spark-submit
  • Wait for an exception in the sbt console :)

I also tried to work with a spark shell:

  • Launch the shell and download the previously written application.
  • Write string in shell
  • Rate it
  • If it's a great copy in the IDE
  • After a few 2,3,4, paste the code into the IDE, compile the spark application and start again.

Is there a way to develop Spark apps faster?

+6
source share
3 answers

I am developing the core logic of our Spark jobs using an interactive environment for rapid prototyping. We use Spark Notebook , working with the development cluster for this purpose.

As soon as I prototype the logic and work as expected, I “industrialize” the code in the Scala project with the classic build life cycle: create tests; Build, pack, and create Jenkins artifacts.

+4
source

I found the writing scripts and used: load /: to copy ordered things, since I didn't need to pack anything. If you use sbt, I suggest you run it and use ~ package so that it automatically packs the jar when changes are made. Ultimately, of course, it will all end in the application bank, this is for prototyping and learning.

  • Local spark
  • Vim
  • spark shell
  • API
  • Console
+1
source

We develop our applications using the IDE (Intellij because we code your spark applications in Scala) using scalaTest for testing.

In these tests, we use local [*] like SparkMaster to enable debugging.

To test the integration, we used Jenkins, and we run the end-to-end script as a Scala application.

I hope this will be helpful

+1
source

Source: https://habr.com/ru/post/988448/


All Articles