How to combine process and file verification in monit?

Summary

How to combine multiple checks in Monit? I want to check the activity of the process and the contents of the file / timestamp.


Long and boring explanation

I am working on the Monit daemon to save my Bukkit Minecraft server. He performs several checks. At the moment I have this code:

#!monit check process bukkit pidfile /var/run/bukkit.pid # check if the java process is running start program = "/sbin/start bukkit" # start with Upstart stop program = "/sbin/stop bukkit" # stop with Upstart if failed # send a noop request to check if the server responses host cubixcraft.de port 20059 protocol http and request "/api/call?method=runConsoleCommand&args=%5B%22noop%22%5D&key=d9c7f3f6be0c92c1b2725f0e5a3352514cee0885c3bf7e0189a76bbaf2f4d7a7" with checksum e006695c8da58e03f17a305afd1a1a32 timeout 20 seconds for 2 cycles then restart # restart if it fails 

It works ... but it is slow. I have to wait 20 seconds until the server is broken if something goes wrong. But I need this timeout, because the server sometimes reboots (update the configuration, clear memory, etc.) from time to time, which leads to small delays. Without timeout 20 seconds for 2 cycles server will immediately shut down if it reboots.

Well, I don’t have to wait 20 seconds until the server reboots if something went wrong. But most of the time (when something goes wrong) all the security mechanisms on the server stop working.

And because of this, I need to find a way to restart the server immediately if it does not respond, but give it some time when it reboots.

My approach is this: the server writes something to the log file when a command is issued (including reboots and API calls that I use to check the status of the server). Thus, the timestamp of the log file is the timestamp of the last command. During the reboot, nothing is written to the file. Therefore, I can detect a reboot with a simple timestamp check, and only if the server restarts, I give it 20 seconds.

+6
source share
1 answer

I managed to do this by overriding the start program:

 start program = "/bin/bash -c '/usr/bin/monit unmonitor bukkit; /sbin/start bukkit; sleep 20; /usr/bin/monit monitor bukkit'" with timeout 25 seconds 

this worked in monit/5.5 , but in monit/5.14 it only works sometimes. since monit/5.14 receives unmonitor while it is unmonitor program, it waits for start complete before doing unmonitor , which means that monitor triggered too early and gets rejected.

0
source

Source: https://habr.com/ru/post/907790/


All Articles