After running the spark job in the Amazon EMR cluster, I deleted the output files directly from s3 and tried to run the job again. I got the following error when trying to write to the parquet file format on s3 using sqlContext.write:
'bucket/folder' present in the metadata but not s3
at com.amazon.ws.emr.hadoop.fs.consistency.ConsistencyCheckerS3FileSystem.getFileStatus(ConsistencyCheckerS3FileSystem.java:455)
I tried to run
emrfs sync s3:
which did not appear to resolve the error, even if it deleted some records from the DynamoDB instance that tracks metadata. Not sure what else I can try. How to fix this error?
source
share