Fixed work in Java: how to access files from "resources" when running in a cluster

Question

Fixed work in Java: how to access files from "resources" when running in a cluster

I wrote a Spark work in Java. The task is packaged in the form of a shaded jar and completed:

spark-submit my-jar.jar

There are some files in the code (Freemarker templates) that are located in src/main/resources/templates . When running locally, I can access the files:

 File[] files = new File("src/main/resources/templates/").listFiles();

When a task is executed in a cluster, a null pointer exception is excluded during the execution of the previous row.

If I run jar tf my-jar.jar , I see that the files are packed in the templates/ folder:

  [...] templates/ templates/my_template.ftl [...]

I just can't read them; I suspect .listFiles() trying to access the local file system in the node cluster, but there are no files there.

I am curious to know how I should pack the files that will be used in Spark offline work. I would prefer not to copy them to HDFS outside of work, because it becomes useless for support.

+5

java apache-spark

Alex woolford Apr 17 '16 at 18:50

source share

2 answers

It seems that running Scala (2.11) code on Spark does not support access to resources in shaded banks.

The execution of this code is:

 var path = getClass.getResource(fileName) println("#### Resource: " + path.getPath())

prints the expected string at startup outside of Spark.

When running inside Spark a java.lang.NullPointerException raised because the path is null.

+4

Craig S. Anderson Mar 16 '17 at 12:06

source share

Oliver dain · Accepted Answer · 2016-04-17T19:35:58+0000

Existing code refers to them as files that are not packaged and sent to Spark nodes. But, since they are in your jar file, you should be able to reference them via Foo.getClass().getResourceAsStream("/templates/my_template_ftl") . Learn more about Java resource streams here: http://www.javaworld.com/article/2077352/java-se/smartly-load-your-properties.html

Fixed work in Java: how to access files from "resources" when running in a cluster

More articles: