Running a spark program using oozie workflow

I work with the scala program using spark packages. I am currently running the program using the bash command from the gateway: / homes / spark / bin / spark -submit --master yarn-cluster --class "com.xxx.yyy.zzz" --driver-java-options "- Dyyy.num = 5 "a.jar arg1 arg2

I would like to start using oozie to do the job. I have several failures:

Where should I put the spark-submit executable? on hfs? How to identify sparking? where should the -driver-java-options options appear? What should oozie look like? does it look like the one that appears here ?

+6
source share
1 answer

If you have a fairly new version of oozie, you can use the oozie spark task:

https://github.com/apache/oozie/blob/master/client/src/main/resources/spark-action-0.1.xsd

Otherwise, you need to execute a java task that will cause a spark. Sort of:

<java> <main-class>org.apache.spark.deploy.SparkSubmit</main-class> <arg>--class</arg> <arg>${spark_main_class}</arg> -> this is the class com.xxx.yyy.zzz <arg>--deploy-mode</arg> <arg>cluster</arg> <arg>--master</arg> <arg>yarn</arg> <arg>--queue</arg> <arg>${queue_name}</arg> -> depends on your oozie config <arg>--num-executors</arg> <arg>${spark_num_executors}</arg> <arg>--executor-cores</arg> <arg>${spark_executor_cores}</arg> <arg>${spark_app_file}</arg> -> jar that contains your spark job, written in scala <arg>${input}</arg> -> some arg <arg>${output}</arg>-> some other arg <file>${spark_app_file}</file> <file>${name_node}/user/spark/share/lib/spark-assembly.jar</file> </java> 
+6
source

Source: https://habr.com/ru/post/984191/


All Articles