Replacing Nagios HTTP with a custom (select / poll) daemon?

I have a Nagios configuration that runs several tests on several hundred nodes; one of them is a check_http option. It is not configured for --enable-embedded-perl (ePN), but we will change that soon. Even with ePN support, I am worried about a model where each execution of this HTTP + SSL Perl check will only process one target.

I would like to write a simple select () (or poll () / epoll ()) daemon that simultaneously creates connections with several targets, reads the results and splashes the results in a form that is applicable to Nagios, as if it were the results of a passive check .

Is there any guide to how this can be done? What is the Nagios Service Pack Update Interface or API?

One hack that I am considering would be for my daemon to update the Redis repository (with a key for each purpose and a short expiration time) and replace check_http with a very small, lightweight GET of the local Redis instance with a key (GET will either get the actual results for Nagios, either the "(nil)" response, which will be processed as if the HTTP connection time were disabled.

However, I am also a little skeptical about my idea, as I think that someone already has something similar.

(BTW: I am ready to be convinced to switch to something like Icinga or Zabbix or Zenoss or OpenNMS ... almost everything that will be better scaled).

+4
source share
1 answer

As for whether Nagios should handle planning and checks, I will leave it for you, as it depends on your version of Nagios (new versions can run these checks at the same time) and why you want a separate daemon for this. for example, with the Nagios version, IIRC version 3 uses parallel checks and scales in this way to a larger node than you are reporting.

However, I can answer the Redis route concept as I did this using Postfix queue statistics and TTFB tracking for websites.

Setting up validation using Python with curl and multiprocessing modules is pretty straightforward since it resets it in Redis. After this period, I would say that no more than an interval would be a solid idea to support the growth of the database. I would recommend that the tis value does not exceed (or possibly less) the scan interval in order to avoid capturing obsolete scan results. If the current check is not completed and the Redis-to-Nagios check is performed by pulling the previous check, you can skip failed checks.

For Redis-To-Nagios, check out a simple redis-cli + bash or Python script to pull data for a given host, returning OK or otherwise, depending on your data, is quite simple and will work quite quickly.

I would recommend running a Redis instance on the Nagios validation server to ensure minimal latency and avoid a network problem causing false alerts on your receipts. I would also recommend checking Nagios on your Redis instance and control daemon. Make the check_http replacement check dependent on the Redis and http_check daemons. You have a dependency chain as follows:

 Redis -> http_checkd -> http_check_replacement 

This will prevent false warnings on http_check_replacement by identifying the problem. For example, if your redis_checkd dies, you get a warning about this, and not 200+ "http_check_replacement failed".

Also, since your data in Redis is, by definition, temporary, I would disable drive persistence. No need to write to disk when data is constantly spinning.

On the side of the note, I would recommend that if you use libcurl, you pull out statistics from libcurl about how long it takes to open a connection and how long the server reacts (time to the first byte - TTFB) and take advantage of Nagios ability to store verification statistics. You may be able to reach a time when this data is really convenient for troubleshooting and performance analysis.

I have a CLI tool that I wrote in C that does this and loads it into a local Redis instance. This is fast - a little more time to get the URL. I expect it to be open this week, I can add Nagios style style to it quite easily. In fact, I think I will do it in the coming weeks or two.

+2
source

Source: https://habr.com/ru/post/1484521/


All Articles