Java: inconsistent watchdog timeout in systemd-notify

My java application installs on OpenSUSE 13.2, and I use systemd to control the process. (systemd version 210)

I would like to use the watchdog systemd function with systemd-notify. However, I notice that the application is restarting due to inconsistent timeouts from the watchdog.

With WatchdogSec = 120, and the application is configured to call a systemd notification every 60 seconds, I observe reboots every five to twenty minutes on average.

here is the systemd file (slightly edited) for the process:

# Cool systemd service [Unit] Description=Something Awesome After=awesomeparent.service Requires=awesomeparent.service [Service] Type=simple WorkingDirectory=/opt/awesome Environment="AWESOME_HOME=/opt/awesome" User=awesomeuser Restart=always WatchdogSec=120 NotifyAccess=all ExecStart=/home/awesome/jre1.8.0_05/bin/java -jar awesome.jar [Install] WantedBy=multi-user.target 

And here is the code to call systemd-notify

 String pidStr = ManagementFactory.getRuntimeMXBean().getName(); pidStr = pidStr.split("@")[0]; String cmd = "/usr/bin/systemd-notify"; Process process = new ProcessBuilder(cmd, "MAINPID=" + pidStr, "WATCHDOG=1").redirectErrorStream(true) .start(); int exitCode = 0; if ((exitCode = process.waitFor()) != 0) { String output = IOUtils.toString(process.getInputStream()); Log.MAIN_LOG.error("Failed to notify systemd: " + ((output.isEmpty()) ? "" : " " + output) + " Exit code: " + exitCode); } 

I never see error messages in logs (the process always returns 0 exit code), and I am 100% sure that the task is executed once a minute per minute. I see how the task log starts just before rebooting.

Anyone have any ideas why systemd-notify just doesn't work?

I am thinking of writing code to directly call sd_pid_notify, but would like to know if there is a simple configuration that I can do before I go along this route.

+3
source share
2 answers

Here is the JNA code that solved the problem:

 import com.sun.jna.Library; import com.sun.jna.Native; /** * The task issues a notification to the systemd watchdog. The systemd watchdog * will restart the service if the notification is not received. */ public class WatchdogNotifierTask implements Runnable { private static final String SYSTEMD_SO = "systemd"; private static final String WATCHDOG_READY = "WATCHDOG=1"; @Override public void run() { try { int returnCode = SystemD.INSTANCE.sd_notify(0, WATCHDOG_READY); if (returnCode < 0) { Log.MAIN_LOG.error( "Systemd watchdog returned a negative error code: " + Integer.toString(returnCode)); } else { Log.MAIN_LOG.debug("Successfully updated systemd watchdog."); } } catch (Exception e) { Log.MAIN_LOG.error("calling sd_notify native code failed with exception: ", e); } } /** * This is a linux-specific interface to load the systemd shared library and call the sd_notify * function. Should we need other systemd functionality, it can be loaded here. It uses JNA for * native library calls. * */ interface SystemD extends Library { SystemD INSTANCE = (SystemD) Native.loadLibrary(SYSTEMD_SO, SystemD.class); int sd_notify(int unset_environment, String state); } } 
+3
source

Does anyone have any idea why systemd-notify just doesn't work sometimes?

This is actually a long-standing issue in several systemd protocols, not just the readiness notification protocol specified by systemd-notify . This issue is also related to the protocol for sending things directly to systemd's own journal.

Both protocols try to find out information about the sending process, the end of the client process, by reading things from /proc/ client-process-id /* . Unfortunately, systemd-notify is a short-term program that terminates as soon as it sends a message to the server. Therefore, reading /proc/ client-process-id /* does not provide information about the end of the client that the server needs. In particular, the server cannot determine to which (systemd) management group the client end belongs, and thus determine which management module controls it, and thus determine whether it is a process that is allowed to send readiness messages.

As you discovered, calling the library routine in your actual d & aelig; mon, instead of deploying a short-lived child process to run systemd-notify , avoids this problem, because of course your d & aelig; mon do not exit immediately after sending the notification. Keep in mind, however, that if you issue a readiness notice right before you exit your daemon (paradoxically, some d & aelig; mons do to notify the world they are completing), you will come across the same the problem is even with the process library.

There is no need to call the systemd library function as native code to, by the way, talk about this protocol. (And without using the library function, you get the advantage of saying this protocol correctly even if systemd is not on its server side; a bug in the systemd library function.) This is not a hard protocol to speak Java, and on the man page systemd describes the protocol. You look at the environment variable, open the datagram socket, use the variable value for the socket name to send, send one datagram message and then close the socket. Java is capable of this. ☺

Further reading

+3
source

Source: https://habr.com/ru/post/1207834/


All Articles