Apache Spark Orchestration Using Apache Oozie

Question

Apache Spark Orchestration Using Apache Oozie

We are thinking about integrating the apache spark in our calculation process, where we first wanted to use apache oozie and the standard MR or MO (Map-Only) jobs.

After some research, a few questions remain :

Is it possible to organize the Apache spark process using apache oozie? If so, how?
Is it possible that oozi is necessary or can he call manual orchestration on his own? (pooling seems to be one of the main spark issues)

Please consider the following scenarios when answering:

running workflow every 4 hours
running a workflow whenever certain data is available.
starts a workflow and configures it with parameters

Thanks for your answers in advance.

+4

hadoop oozie bigdata apache-spark

Matthias kricke Jul 14 '14 at 14:44

source share

1 answer

Mikhail Golubtsov · Accepted Answer · 2015-06-15T08:13:07+0000

Spark is supported in Oozie 4.2 as an action type, see docs . The scripts you mentioned are common features of Oozie.

Apache Spark Orchestration Using Apache Oozie

More articles: