Apache Spark Orchestration Using Apache Oozie

We are thinking about integrating the apache spark in our calculation process, where we first wanted to use apache oozie and the standard MR or MO (Map-Only) jobs.

After some research, a few questions remain :

  • Is it possible to organize the Apache spark process using apache oozie? If so, how?
  • Is it possible that oozi is necessary or can he call manual orchestration on his own? (pooling seems to be one of the main spark issues)

Please consider the following scenarios when answering:

  • running workflow every 4 hours
  • running a workflow whenever certain data is available.
  • starts a workflow and configures it with parameters

Thanks for your answers in advance.

+4
source share
1 answer

Spark is supported in Oozie 4.2 as an action type, see docs . The scripts you mentioned are common features of Oozie.

+3
source

Source: https://habr.com/ru/post/1548358/


All Articles