agent
Launch the agent on each machine that spawns sandbox child processes. The agent runs your code, which is a "crash confirmation", and the sandbox code runs the user code, which may work.
The monitoring system, which is responsible for starting a new machine with a new sandbox process, checks which processes were killed (both the agent and the sandbox, or only a child sandbox process).
This does this by opening a TCP connection (RMI / RPC / HTTP) with the agent requesting its child processes. If the agent answers, the machine is still running, and you can ask about its child processes with a sandbox. If the agent does not respond, the machine is suspected of terminating.
agent (change)
The agent is also responsible for restarting the child sandbox process on the same virtual machine in the event of a failure.
search service
Use a search service (e.g. Zoo Keeper) to keep track of which processes send heartbeat keep-alive. If the agent is alive, the machine is still running, if the agent is not alive, then it is not working.
ec2 api
Interrogate the EC2 APIs to determine if the computer is in a running or shutdown state.
source share