I am running the spark cluster offline and the application using spark-submit. In the section of the scene with a spark UI, I found a runtime scene with a long runtime (> 10h when normal time is ~ 30 seconds). The stage has many failed tasks with a Resubmitted (resubmitted due to lost executor)
error. The stage page has an artist with the address CANNOT FIND ADDRESS
in the Aggregated Metrics by Executor
. The spark is trying to repeat this task endlessly. If I kill this scene (my application re-runs unfinished spark jobs automatically), everything continues to work well.
I also found some strange spark log entries (at the same time as starting the launch).
Teacher:
16/11/19 19:04:32 INFO Master: Application app-20161109161724-0045 requests to kill executors: 0 16/11/19 19:04:36 INFO Master: Launching executor app-20161109161724-0045/1 on worker worker-20161108150133 16/11/19 19:05:03 WARN Master: Got status update for unknown executor app-20161109161724-0045/0 16/11/25 10:05:46 INFO Master: Application app-20161109161724-0045 requests to kill executors: 1 16/11/25 10:05:48 INFO Master: Launching executor app-20161109161724-0045/2 on worker worker-20161108150133 16/11/25 10:06:14 WARN Master: Got status update for unknown executor app-20161109161724-0045/1
Working:
16/11/25 10:06:05 INFO Worker: Asked to kill executor app-20161109161724-0045/1 16/11/25 10:06:08 INFO ExecutorRunner: Runner thread for executor app-20161109161724-0045/1 interrupted 16/11/25 10:06:08 INFO ExecutorRunner: Killing process! 16/11/25 10:06:13 INFO Worker: Executor app-20161109161724-0045/1 finished with state KILLED exitStatus 137 16/11/25 10:06:14 INFO Worker: Asked to launch executor app-20161109161724-0045/2 for app.jar 16/11/25 10:06:17 INFO SecurityManager: Changing view acls to: spark 16/11/25 10:06:17 INFO SecurityManager: Changing modify acls to: spark 16/11/25 10:06:17 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(spark); users with modify permissions: Set(spark)
There are no problems with network connections, because the worker, the wizard (the logs above), the driver runs on the same computer.
Spark version 1.6.1
source share