Unable to connect from an application to an offline cluster

Question

Unable to connect from an application to an offline cluster

I am trying to connect from an application to a stand-alone Spark cluster. I want to do this on one machine. I start the standalone master server with the command:

bash start-master.sh

Then I run one worker on the command:

 bash spark-class org.apache.spark.deploy.worker.Worker spark://PC:7077 -m 512m

(I allocated 512 MB for it).

In the wizards web interface:

 http://localhost:8080

I see that the master and the worker are working.

Then I try to connect from the application to the cluster with the following command:

 JavaSparkContext sc = new JavaSparkContext("spark://PC:7077", "myapplication");

When I run the application, it crashes with the following error message:

 4/11/01 22:53:26 INFO client.AppClient$ClientActor: Connecting to master spark://PC:7077... 14/11/01 22:53:26 INFO spark.SparkContext: Starting job: collect at App.java:115 14/11/01 22:53:26 INFO scheduler.DAGScheduler: Got job 0 (collect at App.java:115) with 2 output partitions (allowLocal=false) 14/11/01 22:53:26 INFO scheduler.DAGScheduler: Final stage: Stage 0(collect at App.java:115) 14/11/01 22:53:26 INFO scheduler.DAGScheduler: Parents of final stage: List() 14/11/01 22:53:26 INFO scheduler.DAGScheduler: Missing parents: List() 14/11/01 22:53:26 INFO scheduler.DAGScheduler: Submitting Stage 0 (ParallelCollectionRDD[0] at parallelize at App.java:109), which has no missing parents 14/11/01 22:53:27 INFO scheduler.DAGScheduler: Submitting 2 missing tasks from Stage 0 (ParallelCollectionRDD[0] at parallelize at App.java:109) 14/11/01 22:53:27 INFO scheduler.TaskSchedulerImpl: Adding task set 0.0 with 2 tasks 14/11/01 22:53:42 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 14/11/01 22:53:46 INFO client.AppClient$ClientActor: Connecting to master spark://PC:7077... 14/11/01 22:53:57 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 14/11/01 22:54:06 INFO client.AppClient$ClientActor: Connecting to master spark://PC:7077... 14/11/01 22:54:12 WARN scheduler.TaskSchedulerImpl: Initial job has not accepted any resources; check your cluster UI to ensure that workers are registered and have sufficient memory 14/11/01 22:54:26 ERROR cluster.SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up. 14/11/01 22:54:26 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0, whose tasks have all completed, from pool 14/11/01 22:54:26 INFO scheduler.DAGScheduler: Failed to run collect at App.java:115 Exception in thread "main" 14/11/01 22:54:26 INFO scheduler.TaskSchedulerImpl: Cancelling stage 0 org.apache.spark.SparkException: Job aborted due to stage failure: All masters are unresponsive! Giving up. at org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$failJobAnd IndependentStages(DAGScheduler.scala:1033) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1017 ) at org.apache.spark.scheduler.DAGScheduler$$anonfun$abortStage$1.apply(DAGScheduler.scala:1015 ) at scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59) at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47) at org.apache.spark.scheduler.DAGScheduler.abortStage(DAGScheduler.scala:1015) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.s cala:633) at org.apache.spark.scheduler.DAGScheduler$$anonfun$handleTaskSetFailed$1.apply(DAGScheduler.s cala:633) at scala.Option.foreach(Option.scala:236) at org.apache.spark.scheduler.DAGScheduler.handleTaskSetFailed(DAGScheduler.scala:633) at org.apache.spark.scheduler.DAGSchedulerEventProcessActor$$anonfun$receive$2.applyOrElse(DAG Scheduler.scala:1207) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498) at akka.actor.ActorCell.invoke(ActorCell.scala:456) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237) at akka.dispatch.Mailbox.run(Mailbox.scala:219) at akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386) at scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260) at scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339) at scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979) at scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107) 14/11/01 22:54:26 INFO handler.ContextHandler: stopped oejsServletContextHandler{/metrics/json,null} 14/11/01 22:54:26 INFO handler.ContextHandler: stopped oejsServletContextHandler{/stages/stage/kill,null} 14/11/01 22:54:26 INFO handler.ContextHandler: stopped oejsServletContextHandler{/,null} 14/11/01 22:54:26 INFO handler.ContextHandler: stopped oejsServletContextHandler{/static,null} 14/11/01 22:54:26 INFO handler.ContextHandler: stopped oejsServletContextHandler{/executors/json,null} 14/11/01 22:54:26 INFO handler.ContextHandler: stopped oejsServletContextHandler{/executors,null} 14/11/01 22:54:26 INFO handler.ContextHandler: stopped oejsServletContextHandler{/environment/json,null}

Any ideas what is going on?

PS I am using a pre-built version of Spark - spark-1.1.0-bin-hadoop2.4.

Thanks.

+5

apache-spark

dimson Nov 01 '14 at 20:02

source share

1 answer

Josh rosen · Accepted Answer · 2014-11-02T18:04:44+0000

Make sure that both standalone workers and the Spark driver are connected to the Spark wizard at the exact address specified in its web interface /, printed in its startup log message. Spark uses Akka for some of its communication communications, and Akka can be very picky about host names, so they must be exactly matched.

There are several options for managing hosts / network interfaces to which the driver and master will be bound. Probably the easiest option is to set the SPARK_LOCAL_IP environment SPARK_LOCAL_IP to control the address to which the master / driver binds. See http://databricks.gitbooks.io/databricks-spark-knowledge-base/content/troubleshooting/connectivity_issues.html for an overview of other parameters that affect network address binding.

Unable to connect from an application to an offline cluster

More articles: