How to delete or override files added to pyspark?

I added the egg file to the pyspark context using

sc.addPyFile('/path/to/my_file.egg')

However, if I made some changes and restored my egg file. I can not add it again. Spark said that the file already exists, that I cannot add it again. Here is a stacktrace

org.apache.spark.SparkException: File /tmp/spark-ddfc2b0f-2897-4fac-8cf3-d7ccee04700c/userFiles-44152f58-835a-4d9f-acd6-f841468fa2cb/my_file.egg exists and does not match contents of file:///path/to/my_file.egg
    at org.apache.spark.util.Utils$.copyFile(Utils.scala:489)
    at org.apache.spark.util.Utils$.doFetchFile(Utils.scala:595)
    at org.apache.spark.util.Utils$.fetchFile(Utils.scala:394)
    at org.apache.spark.SparkContext.addFile(SparkContext.scala:1409)

Is there a way to say a spark to override it?

Thank,

+4
source share
1 answer

The only way to delete (or override) files that were added using sc.addPyiFiles () was to restart the pyspark interpreter.

-1
source

Source: https://habr.com/ru/post/1660341/


All Articles