Can the OTP manager control the process on the remote node device?

Question

Can the OTP manager control the process on the remote node device?

I would like to use erlang OTP supervisor in the distributed application that I am creating. But it's hard for me to understand how this type of supervisor can control the process running on a remote Node device. Unlike the start_link erlang function, start_child has no parameters to indicate the Node on which the child will be spawned.

Is it possible for the OTP supervisor to control the remote child, and if not, how can I achieve this in erlang?

+5

erlang distributed supervisor otp

Brinley Nov 24 '16 at 17:03

source share

1 answer

zxq9 · Answer 1 · 2016-11-26T10:49:23+0000

supervisor:start_child/2 can be used through nodes.

The reason for your confusion is simply confusion about the context of execution (which admittedly is a little difficult to maintain directly). There are three processes involved in any OTP spawning:

Interrogator
Supervisor
Spawned process

The Requestor context is the one in which supervisor:start_child/2 is called, not the context of the supervisor itself. Usually you provide a supervisor interface by exporting a function that ends the supervisor:spawn_child/2 call:

 do_some_crashable_work(Data) -> supervisor:start_child(sooper_dooper_sup, [Data]).

This can be defined and exported from the supervisor module, defined internally by the “manager” process according to the “service manager / supervisor / workers” idiom or something else. However, in all cases, this process calls a different process than the dispatcher.

Now take a close look at the Erlang docs for supervisor:start_child/2 ( the mirror is R19.1 doc here, as erlang.org is sometimes troubled for some reason). Note that the type sup_ref() can be a registered name, pid() , a {global, Name} or {Name, Node} . The interrogator can be on any node calling the supervisor on any other node when called with the root pid() , {global, Name} or {Name, Node} .

The observer does not just accidentally delete things. He has child_spec() , and he tells the supervisor what to call in order to start this new process. This first call in the child module is made in the context of the supervisor and is a custom function. Although we usually call it something like start_link/N , it can do whatever it wants as part of the launch, including declaring the specific node to appear on. So now we end with something like this:

 %% Usually defined in the requestor or supervisor module do_some_crashable_work(SupNode, WorkerNode, Data) -> supervisor:start_child({sooper_dooper_sup, SupNode}, [WorkerNode, Data]).

With a child spec, something like:

 %% Usually in the supervisor code SooperWorker = {sooper_worker, {sooper_worker, start_link, []}, temporary, brutal_kill, worker, [sooper_worker]},

Which means that the first call will be sooper_worker:start_link/2 :

 %% The exported start_link function in the worker module %% Called in the context of the supervisor start_link(Node, Data) -> Pid = proc_lib:spawn_link(Node, ?MODULE, init, [self(), Data]). %% The first thing the newly spawned process will execute %% in its own context, assuming here it is going to be a gen_server. init(Parent, Data) -> Debug = sys:debug_options([]), {ok, State} = initialize_some_state(Data) gen_server:enter_loop(Parent, Debug, State).

You may think that everything that holds with proc_lib was for. It turns out that when calling eggs from anywhere inside the multi-w633 system to initiate the appearance of elsewhere in the multi-node system, this is simply not a very useful way of doing business, and therefore the behavior of gen_* and even proc_lib:start_link/N does not have a method node declarations on which a new process is occurring.

Ideally, you need nodes that know how to initialize themselves and join the cluster after they start. No matter which services provided by your system are usually best replicated to other nodes in the cluster, then you need to write a node selection method that allows you to fully view the business process, as it is now. node is local in each case. In this case, no matter what your usual manager / supervisor / working code code needs to be changed, everything happens simply, and it does not matter that the request-request identifier is on another node, even if this PID is the address to which results should be returned.

In other words, we really do not want to create workers on arbitrary nodes, what we really want to do is go to a higher level and request that some work be performed by another node and not really care about how this happens. Remember that to create a specific function based on the {M,F,A} call, the called node you are calling must have access to the target module and function - if it already has a copy of the code, why is it not a duplicate node call?

I hope this answer explained more than he was embarrassed.

Can the OTP manager control the process on the remote node device?

More articles: