What is the meaning of the Uozie booster?

I created a simple Oozi workflow with Sqoop, Hive, and Pig actions. For each of these actions, Oozie launches an MR launcher and, in turn, launches an action (Sqoop / Hive / Pig). Thus, in the workflow there are 6 MR tasks for 3 actions.

Why does Oozie launch an MR launcher to launch an action, rather than launch an action directly?

+6
source share
1 answer

I posted the same thing on the Apache Flume forums, and here is the answer.

It must also keep the Oozie server from withering or becoming unstable. For example, if you have many workflows running Pig jobs, then you will have an Oozie server with multiple copies of the Pig client (which is a relatively "heavy" program). By moving all user code and external clients to map tasks in the startup task, the Oozie server remains lighter and less error prone. It can also be much more scalable in this way, because tasks on the launcher distribute the launch / monitoring of other machines in the cluster; otherwise, the Oozie server does everything, we will need to limit the number of parallel workflows based on your specifications on the Oozie server (RAM, processor, etc.). And finally, from an architectural point of view, the Oozie server is itself stateless; that is, everything is stored in the database and the Oozie Server can be removed at any time without losing anything. If we were to run jobs directly from the Oozie server, then we now have it (for example, the Pig client cannot be restarted and resumed).

+4
source

Source: https://habr.com/ru/post/956385/


All Articles