While your application is running, Bad Things may happen from which we would like to recover. One example is Power Fail.
There is no computer technology for organizing instructions for execution on a disconnected device. Therefore, when rebooting, we may need to reset some state. Your application already has this requirement; I just make it explicit.
It is difficult to reliably gain control immediately after each of the various Bad Things that can happen, as you discovered when you carefully examined several standard methods. You were not specific about the items that need cleaning that you imagine, but we could consider these cases:
- transient - TCP connections, flock, etc.
- persistent disk files, side effects on remote hosts
Instead of directly linking to your application, arrange for it to be launched by the Nanny process, which opens the application as a child. At some point, the application will exit, the nanny will regain control with all the transitional elements that have been removed by the operating system, and then the nanny can perform any necessary cleaning on the permanent elements before restarting the application. This is identical to the cleanup that the Nanny will have to do on initial startup, for example, after a power failure event. The advantage of running your application as part of the parent process is that the parent can immediately clean up after simple application crashes such as SEGV.
Clearing persistent items probably includes timeouts on temporary resources. If your system can reboot within about 2 seconds after a short power outage, you may find that you must consciously remain in standby (sleep) mode long enough to ensure that the remote hosts reliably detect your transition to Down, before the announcement of the transition up . Techniques such as Virtual Synchrony and Paxos will help you accelerate convergence.
summary
Sometimes an application dies unexpectedly before running the cleanup code. Take the approach to belts and suspenders: put the main cleanup code in the (simpler and more reliable) parent process.
source share