Running a spark program using oozie workflow

Question

Running a spark program using oozie workflow

I work with the scala program using spark packages. I am currently running the program using the bash command from the gateway: / homes / spark / bin / spark -submit --master yarn-cluster --class "com.xxx.yyy.zzz" --driver-java-options "- Dyyy.num = 5 "a.jar arg1 arg2

I would like to start using oozie to do the job. I have several failures:

Where should I put the spark-submit executable? on hfs? How to identify sparking? where should the -driver-java-options options appear? What should oozie look like? does it look like the one that appears here ?

+6

workflow scala oozie apache-spark

Shaharg Mar 24 '15 at 13:07

source share

1 answer

nurieta · Accepted Answer · 2015-04-08T00:15:56+0000

If you have a fairly new version of oozie, you can use the oozie spark task:

https://github.com/apache/oozie/blob/master/client/src/main/resources/spark-action-0.1.xsd

Otherwise, you need to execute a java task that will cause a spark. Sort of:

<java> <main-class>org.apache.spark.deploy.SparkSubmit</main-class> <arg>--class</arg> <arg>${spark_main_class}</arg> -> this is the class com.xxx.yyy.zzz <arg>--deploy-mode</arg> <arg>cluster</arg> <arg>--master</arg> <arg>yarn</arg> <arg>--queue</arg> <arg>${queue_name}</arg> -> depends on your oozie config <arg>--num-executors</arg> <arg>${spark_num_executors}</arg> <arg>--executor-cores</arg> <arg>${spark_executor_cores}</arg> <arg>${spark_app_file}</arg> -> jar that contains your spark job, written in scala <arg>${input}</arg> -> some arg <arg>${output}</arg>-> some other arg <file>${spark_app_file}</file> <file>${name_node}/user/spark/share/lib/spark-assembly.jar</file> </java>

Running a spark program using oozie workflow

More articles: