Spark step on EMR just freezes like “Launch” after recording on S3

Running PySpark 2 in EMP 5.1.0 as a step. Even after the script is executed with the file _SUCCESSrecorded on S3 and Spark UI, showing that the task is completed, EMR still shows the step as "Running". I waited more than an hour to make sure that Spark was simply trying to cleanse himself, but this step was never displayed as "Completed." The last thing that is recorded in the logs:

INFO MultipartUploadOutputStream: close closed:false s3://mybucket/some/path/_SUCCESS
INFO DefaultWriterContainer: Job job_201611181653_0000 committed.
INFO ContextCleaner: Cleaned accumulator 0

I did not have this problem with Spark 1.6. I tried a bunch of different banners hadoop-awsand to aws-java-sdkno avail.

I use Spark 2.0 configurations by default, so I don’t think that anything else, like metadata, is being written. Also, the size of the data does not affect this problem.

+4
source share
2 answers

If you have not already done so, you must close your spark context.

sc.stop()

In addition, if you are viewing the Spark Web web interface through a browser, you must close it because it sometimes supports the spark context. I remember seeing this on the spark dev mailing list, but cannot find a gira for this.

+3
source

We encountered this problem and resolved it by completing a job in cluster deployment mode using the following spark-submit option:

spark-submit --deploy-mode cluster 

, , - , . , . , , . ,

0

Source: https://habr.com/ru/post/1661237/


All Articles