How to send code to a remote Spark cluster from IntelliJ IDEA

I have two clusters: one in the local virtual machine and the other in the remote cloud. Both clusters are offline.

My environment:

Scala: 2.10.4 Spark: 1.5.1 JDK: 1.8.40 OS: CentOS Linux release 7.1.1503 (Core) 

Local cluster:

Sparks: Spark: // local1: 7077

Remote Cluster:

Spark Master: spark: // remote1: 7077

I want to end this:

Enter the codes (a simple number of words) in IntelliJ IDEA locally (on my laptp) and set the Spark Master URL to spark://local1:7077 and spark://remote1:7077 , then run my codes in IntelliJ IDEA. That is, I do not want to use spark-submit to submit the job.

But I had a problem:

When I use a local cluster, everything is going well. Run the codes in IntelliJ IDEA or use spark-submit, you can send the task to the cluster and exit.

But when I use the remote cluster, I got a warning log:

TaskSchedulerImpl: the initial task did not accept any resources; Check your cluster user interface to make sure workers are registered and have sufficient resources.

These are sufficient resources, not enough memory !

And this magazine continues to print, no further action. Both spark-submit and launch codes in IntelliJ IDEA match.

I want to know:

  • Can I send codes from IntelliJ IDEA to a remote cluster?
  • If everything is ok, do you need a configuration?
  • What are the possible causes that may cause my problem?
  • How can I deal with this problem?

Thanks a lot!

Update

There is a similar question here , but I think my scene is different. When I run my codes in IntelliJ IDEA and install Spark Master in a local virtual machine cluster, it works. But instead, I got a warning Initial job has not accepted any resources;...

I want to know if this can lead to a security policy or fireworks?

+5
source share
1 answer

Passing code programmatically (for example, through SparkSubmit ) is quite complicated. At the very least, there are many settings and environmental considerations handled by the spark-submit script that are quite difficult to replicate within the scala program. I still don’t know how to do this. There have been many long streams in the spark development community in this area.

My answer here is about part of your post: specifically

TaskSchedulerImpl: the initial task did not accept any resources; check your cluster user interface to ensure that workers are registered and have sufficient resources

The reason usually is the mismatch of the requested memory and / or the number of cores from your work compared to what was available in the cluster. Perhaps when sending from IJ

$ SPARK_HOME / CONF / spark defaults.conf

incorrectly matched the parameters necessary for your task in the existing cluster. You may need to update:

 spark.driver.memory 4g spark.executor.memory 8g spark.executor.cores 8 

You can check the ui spark on port 8080 to make sure that the parameters you requested are indeed available in the cluster.

+3
source

Source: https://habr.com/ru/post/1235529/


All Articles