Nagios notification when a message is not received within 48 hours

Question

Nagios notification when a message is not received within 48 hours

At Nagios it is easy to verify that a LogMessage occurred in the last 48 hours and that an alarm sounds. However, I would like to set Nagios to beep when a specific message has not occurred within 48 hours.

Can someone point me in the right direction?

I use the "Check WMI Plus" plugin (no agent) to check the event log in the window window.

+4

nagios

buckley May 26 '13 at 14:47

source share

2 answers

Gregnz · Answer 1 · 2013-06-05T03:52:41+0000

Not knowing what your exact “specific message” is, it’s hard to give a specific answer, but we can do it:

I am going to raise the CRITICAL event if I have not seen the error "Window Group Policy Processing failed" in the last 48 hours.

You use the -w and -c options to define criteria for WARNINGS and CRITICAL events in check_wmi_plus.

From check_wmi_plus.pl --help | less -i check_wmi_plus.pl --help | less -i we get help and we can find checkeventlog options.

There are two tricks:

checkeventlog has only one _ItemCount field, so you do not need to specify it
You want to specify a range of values that includes only 0 - so use @0:0

First, define a specific section in the events.ini file. My: /opt/nagios/bin/plugins/check_wmi_plus.d/events.ini

I added the following:

 [eventSpecial] im=Group Policy failed

I added this just below the [eventdefault] section.

Basically, im= means "include a message" - if it is not specified, everything is included, therefore, specifying it, I said that "include only messages that match this regular expression."

Then you need the checkeventlog command

I use:

 /opt/nagios/bin/plugins/check_wmi_plus.pl -H HOST -u USER -p PASS -m checkeventlog -a % -o 2 -3 48 -4 eventSpecial -c @0:0

So, for optional arguments (again with the -help option):

-a% == search all event logs

-o 2 == Only warning and error severity

-3 48 == last 48 hours

-4 eventSpecial == refer to the section in events.ini that I just created

-c @ 0: 0 == raise CRITICAL if there are exactly 0 events

With this command, if there are ARE messages during the period, I get:

OK - 3 events (s) of severity level: "Error, warning", were recorded in the last 48 hours from the% event log. (The list is on the next line. Shown - Logfile: TimeGenerated: SeverityLevel: EventId: Type: SourceName: Message) | 'Event Count' = 3; 0; System: 20130604195600.378642-000 | Error: 1129: 0: Microsoft-Windows-GroupPolicy: Group Policy processing failed due to a lack of a network connection to a domain controller. This may be a transitional state. A success message will be generated after the machine gets connected to the domain controller and Group Policy is successfully processed. If you do not see a success message for several hours, contact your administrator. System: 20130604055521.084809-000 | Error: 1129: 0: Microsoft-Windows-GroupPolicy: Group Policy processing failed due to a lack of a network connection to a domain controller. This may be a transitional state. A success message will be generated after the machine gets connected to the domain controller and Group Policy is successfully processed. If you do not see a success message for several hours, contact your administrator. System: 20130603220259.894040-000 | Error: 1055: 0: Microsoft-Windows-GroupPolicy: Group Policy processing failed. Windows could not computer name. This may be due to one of the following factors:
a) Name resolution error on the current domain controller. b) Delayed replication of Active Directory (an account created on another domain controller is not replicated to the current domain of the controller).

Which does not include a critical event.

If they are not, I get the following:

CRITICAL - [Started _ItemCount in the range 0: 0] - 0 events (s) Severity: "Error, warning", were logged in the last 4 hours from the% event log. 'Event Count' = 0; 0;

Which includes a critical event because there were no entries in the log file according to my criteria.

And you can simply define the standard Nagios command using the appropriate $ USER8 $ macros to include it in your configuration.

user176316 · Answer 2 · 2013-06-04T13:22:04+0000

You should try this and create a simple DOS script that runs every hour to monitor nagios and restarts it when it sees 2 nagios.exe. Here is a DOS script to kill the nagios.exe service and restart it.

-------- CheckNagios.bat --------

 @echo off set mypgm=nagios.exe REM GET date/time stamp For /f "tokens=2-4 delims=/ " %%a in ('date /t') do (set mydate=%%c-%%a-%%b) For /f "tokens=1-2 delims=/:" %%a in ('time /t') do (set mytime=%%a%%b) :checkNagios rem get number of nagios processes for /f %%i in ('c:\windows\system32\tasklist.exe ^| find /i /c "%mypgm%"') do set /a numProc=%%i echo Last Check: %mydate%_%mytime% ECHO # of processes = %numProc% if %numProc% GTR 1. (goto kill) else goto end :kill c:\windows\system32\taskkill.exe /f /IM %mypgm% REM restart nagios net start Nagwin_Nagios REM restart other nagios processes rem for /f %%x in ('net start ^| findstr /i "nagwin_"') do net stop %%x :end echo Exiting program. echo ================= rem SCHEDULE TASK TO RUN EVERY HOUR and pipe to a logfile rem SCHTASKS /create /TN "Check Nagios" /TR "c:\icw\bin\checkNagios.bat >> c:\checknagios.log 2>&1" /SC HOURLY /ST 16:00 /MO 1 /RU DOMAIN\USERNAME /RP PASSWORD REM store last check that will be used by emailNagios.bat using blat.exe set LAST_NAGIOS_CHECK=%%mydate%%_%%mytime%%

Nagios notification when a message is not received within 48 hours

More articles: