How to debug a scala-based Spark program on Intellij IDEA

I am currently creating my development IDE using Intellij IDEA. I followed exactly the same as http://spark.apache.org/docs/latest/quick-start.html

build.sbt file

name := "Simple Project" version := "1.0" scalaVersion := "2.11.7" libraryDependencies += "org.apache.spark" %% "spark-core" % "2.0.0" 

Program File Example

 import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.SparkConf object MySpark { def main(args: Array[String]){ val logFile = "/IdeaProjects/hello/testfile.txt" val conf = new SparkConf().setAppName("Simple Application") val sc = new SparkContext(conf) val logData = sc.textFile(logFile, 2).cache() val numAs = logData.filter(line => line.contains("a")).count() val numBs = logData.filter(line => line.contains("b")).count() println("Lines with a: %s, Lines with b: %s".format(numAs, numBs)) } } 

If I use the command line:

 sbt package 

and then

 spark-submit --class "MySpark" --master local[4] target/scala-2.11/myspark_2.11-1.0.jar 

I can generate a jar package and the spark works well.

However, I want to use Intellij IDEA to debug the program in the IDE. How to configure the configuration, so that if I press "debug", it will automatically generate the jar package and automatically launch the task by running the command line "spark-submit-".

I just want everything to be as simple as a β€œone click” on a debug button in Intellij IDEA.

Thanks.

+5
source share
4 answers

you can just add spark options below

 export SPARK_SUBMIT_OPTS=-agentlib:jdwp=transport=dt_socket,server=y,suspend=y,address=7777 

And create the Debug configuration as follows

Rub-> Edit Configuration -> Click "+" left top cornor -> Remote -> set port and name

After completing the above configuration, launch the spark application with the launch of spark-submit or sbt, and then run the debug that is created in the configuration. and add breakpoints for debugging.

+10
source

I came across this when switching between 2.10 and 2.11. SBT expects the main object to be located in src-> main β†’ scala -2.10 or src-> main β†’ scala -2.11 depending on your version.

+1
source

If you use the scala plugin and set up your project as an sbt project, it should basically work out of the box.

Go to Run Edit Configurations... and usually add your run configuration.

Since you have a main class, you probably want to add a new Application configuration.

You can also just click the blue square icon to the left of your main code.

Once your launch configuration is configured, you can use the Debug feature.

0
source

This is similar to the solution proposed here: Debugging Spark applications . You create a remote debugging configuration in Idea and pass the Java debugging options to the spark-submit command. The only catch is that you need to start the remote debugging configuration in Idea after running the spark-submit command. I read somewhere that Thread.sleep in front of your debug point should let you do this, and I was also able to successfully use this suggestion.

0
source

Source: https://habr.com/ru/post/1257752/


All Articles