We are thinking about integrating the apache spark in our calculation process, where we first wanted to use apache oozie and the standard MR or MO (Map-Only) jobs.
After some research, a few questions remain :
- Is it possible to organize the Apache spark process using apache oozie? If so, how?
- Is it possible that oozi is necessary or can he call manual orchestration on his own? (pooling seems to be one of the main spark issues)
Please consider the following scenarios when answering:
- running workflow every 4 hours
- running a workflow whenever certain data is available.
- starts a workflow and configures it with parameters
Thanks for your answers in advance.
source
share