I wrote a Spark work in Java. The task is packaged in the form of a shaded jar and completed:
spark-submit my-jar.jar
There are some files in the code (Freemarker templates) that are located in src/main/resources/templates . When running locally, I can access the files:
File[] files = new File("src/main/resources/templates/").listFiles();
When a task is executed in a cluster, a null pointer exception is excluded during the execution of the previous row.
If I run jar tf my-jar.jar , I see that the files are packed in the templates/ folder:
[...] templates/ templates/my_template.ftl [...]
I just can't read them; I suspect .listFiles() trying to access the local file system in the node cluster, but there are no files there.
I am curious to know how I should pack the files that will be used in Spark offline work. I would prefer not to copy them to HDFS outside of work, because it becomes useless for support.
source share