How does my process detect that the computer is shutting down?

I am running some applications on EC2 instances. Such cases may be killed by Amazon without notice.

In the process of shutting down, processes are destroyed in some order. We have monitoring / recovery programs that should behave differently depending on whether the server terminates or the process just crashed. (in particular, we don’t want to do anything if the server actually shuts down)

How can I detect during the recovery process (if it is still alive) that the processes were killed due to shutdown?

(Additional system information: I run unknown / untrusted / etc code in a sandbox that does not change the external state. Usually, if the sandbox crashes, this is the author’s error of the untrusted code, and we won’t restart it, but if the isolated code ends up because for shutting down or crashing the virtual machine, we need to restart it on another instance.The problem that I am currently facing is that the user code ends first, so the monitoring program incorrectly considers that the failure is Xia user error.)

+6
source share
4 answers

agent

Launch the agent on each machine that spawns sandbox child processes. The agent runs your code, which is a "crash confirmation", and the sandbox code runs the user code, which may work.

The monitoring system, which is responsible for starting a new machine with a new sandbox process, checks which processes were killed (both the agent and the sandbox, or only a child sandbox process).

This does this by opening a TCP connection (RMI / RPC / HTTP) with the agent requesting its child processes. If the agent answers, the machine is still running, and you can ask about its child processes with a sandbox. If the agent does not respond, the machine is suspected of terminating.

agent (change)

The agent is also responsible for restarting the child sandbox process on the same virtual machine in the event of a failure.

search service

Use a search service (e.g. Zoo Keeper) to keep track of which processes send heartbeat keep-alive. If the agent is alive, the machine is still running, if the agent is not alive, then it is not working.

ec2 api

Interrogate the EC2 APIs to determine if the computer is in a running or shutdown state.

+5
source

How does the recovery process work?

If you use waitpid to control the process when it exits, you can determine:

  • Does it run normally, and what status does the process return if it did, or
  • Did he get out of the signal and what kind of signal is it.

Depending on how the process is closed, I expect it to either exit normally or exit through SIGTERM or SIGKILL . SIGILL , SIGABRT , SIGFPE , SIGBUS , SIGSEGV and SIGSYS indicate a programming error.

+2
source

It sounds like a very fragile circuit. Do not try to determine the state of the system: ask the application to write the valid token (and synchronize the corresponding files!), Somehow after a "clean" shutdown / stop / stop of the application and use it.

+1
source

I assume that when an instance disables your monitoring process, it will receive a SIGTERM signal.

Thus, one could do something like: - the IF monitoring process has completed && & not a single SIGTERM signal received during the next says 5 seconds - suppose that the process crashed. If SIGTERM was received, just exit the signal handler.

0
source

Source: https://habr.com/ru/post/916309/


All Articles