How to make Graphite just count the counters, not their speed

I use Graphite and Collectd to monitor my server. In particular, I use the tail pluggin to count failed SSH logins. I use a counter for this metric, so expect to see 1, 2, 3, 0, etc. For data points. However, what I see is 0.1, 0.2, 0.3, 0, etc. It seems to me that Graphite provides the number of seconds per second. I say this because my storage policy is one data point every 10 seconds for two hours. Thus, 1 failed entry in 10 seconds = 0.1 per second. I look at it on the chart. It looks like this:

Image

Also, when I scale to the next retention level, the numbers are adjusted accordingly: so 1 bad login, which was shown as 0.1, is now shown as much less than that: 0.017 or something.

I do not think that this is due to the aggregation method used: even the finest data is disabled. How to get Graphite to treat this metric as a clean, raw counter?

Here is my storage-schemas.conf (storage policy):

[my_server] pattern = .* retentions = 10s:2h,1m:2d,30m:400d 

Here is my prefab tail plugin configuration:

 <Plugin "tail"> <File "/var/log/auth.log"> Instance "auth" <Match> Regex "sshd[^:]*: Failed password" DSType "CounterInc" Type "counter" Instance "sshd-invalid_user" </Match> </File> </Plugin> 

And here is my write_graphite pluggin configuration (which sends data to graphite):

 <Plugin write_graphite> <Node "my_server_name"> Host "localhost" Port "2003" Protocol "tcp" LogSendErrors true Prefix "collectd." #Postfix "" StoreRates true AlwaysAppendDS false EscapeCharacter "_" </Node> </Plugin> 

I tried to set StoreRates false for write_graphite pluggin, but this did not work. This changed the behavior: when I performed one unsuccessful SSH input, this metric shows as 1. However, it did not fall to 0. When I made two more unsuccessful inputs, the metric appears to 3.

Also interesting: I also downloaded the pluggin users, which simply shows the number of registered users, and it works fine: shows 1 when I am connected to SSH, two when I reconnect to SSH, and back to 1 when I exit one SSH. For both StoreRates settings. So it seems that I want something. Maybe not with a tail pluggin, though.

These graphs show SSH logins with StoreRates false and the correct behavior for users.

Image

Any ideas? Thanks,

+5
source share
3 answers

Even though the swissunix answer is very useful to achieve the behavior I was looking for, in the end I used Logster instead of Collectd. With Logster, you write a bit of code that parses the file, as well as a bit that returns the metric. Therefore, although time-sharing is common with Logster, you do not need to do this if you do not want to: there is a lot of flexibility.

I put my parsers here: https://github.com/camlee/logster-parsers

+2
source

You ask the system to count the number of events. And that’s exactly what he does: he counts the number of failed logins since it started. Whether you StoreRates or not just change the way information is displayed: as speed or as a raw counter. The counter can never decrease! What you are actually requesting is a counter that is reset when reading: count the number of failed inputs since the last collector check.

As it happens, the type of ABSOLUTE data ABSOLUTE in rrdtool can be used to achieve this, but it will not help you.

Go back and think about what you are trying to achieve: the number of failed logins per second seems to me an absolutely flawless metric!

+3
source

If you set StoreRates to false, in graphite you can apply a derivative function to a constantly increasing counter to get an increase rate for each storage interval that would suit your requirements.

eg. in your example report, 1 failed entry, then 2, you saw values ​​1 and 3. Derivative 1 and 2: failed logs for each interval that graphs the tracks.

+2
source

Source: https://habr.com/ru/post/1200510/


All Articles