FileNotFoundException when trying to save the DataFrame format to the parquet, with the overwrite mode

Question

FileNotFoundException when trying to save the DataFrame format to the parquet, with the overwrite mode

I have this weird mistake. I have a program that reads a data frame if it exists (or creates it otherwise), modifies it, and then saves it again on the same target path in a parquet format with overwrite mode.

In the first run, when there is no data frame, I create it and save it. It generates 4 files in the output folder:

_SUCCESS.crc
part-R- <......> snappy.parquet.crc
_SUCCESS
part-R- <......> snappy.parquet

Then in the second run, I read the data, modify it, and when I try to overwrite it, it throws an exception that *part-r-<.....>.snappy.parquet file does not exist*.

The output folder is empty when an exception occurs, but the df.write.parquet(path, 'overwrite')folder contains this file before execution .

I tried setting spark.sql.cacheMetadata to false, but that didn't help. Spark .directory .listTables () returns an empty list, so it makes no sense to update anything.

Currently, I am just deleting output folder items and writing dataframe. It works. But why does the original overwrite method fail?

Thank.

+4

apache-spark pyspark apache-spark-sql

Mike Mar 05 '17 at 11:05

source share

2 answers

, , - -

df.cache()

hdfs.

+1

big_mike_boiii 25 . '17 20:45

RBanerjee · Accepted Answer · 2017-03-05T17:05:01+0000

RDD , , , (getPartition), () .

, ,

1- = > ... = > A 2- = > A = > = > A

, A. , Spark DAG, , , (Save to a), /.

, Spark , , , .

, , , , tmp.

FileNotFoundException when trying to save the DataFrame format to the parquet, with the overwrite mode

More articles: