Running PySpark 2 in EMP 5.1.0 as a step. Even after the script is executed with the file _SUCCESSrecorded on S3 and Spark UI, showing that the task is completed, EMR still shows the step as "Running". I waited more than an hour to make sure that Spark was simply trying to cleanse himself, but this step was never displayed as "Completed." The last thing that is recorded in the logs:
INFO MultipartUploadOutputStream: close closed:false s3://mybucket/some/path/_SUCCESS
INFO DefaultWriterContainer: Job job_201611181653_0000 committed.
INFO ContextCleaner: Cleaned accumulator 0
I did not have this problem with Spark 1.6. I tried a bunch of different banners hadoop-awsand to aws-java-sdkno avail.
I use Spark 2.0 configurations by default, so I don’t think that anything else, like metadata, is being written. Also, the size of the data does not affect this problem.
source
share