Follow Slashdot blog updates by subscribing to our blog RSS feed


Forgot your password?
User Journal

Journal CustomDesigned's Journal: Passive Obsessive Checking Disorder

In the standard distributed monitoring described in the Nagios docs, check results flow one way - from leaf to root. I needed something a little different - peer to peer distributed monitoring. There are several problems that drive this need. One is checking public services. Our nagios server runs in the same tiny backroom "data center" as the public web server. It can check things internally, but can't check that the general public can actually get to hosts and services that are up internally. Another is that we have larger customers who manage their own network. Our nagios server rightly does not have direct access to all the internal services that need to be checked.

One way to handle both problems is to use nagios remote plugin execution (nrpe) to run the problematic tests on another server. However, our larger customer also wants his own nagios server, with only his network. Having our server monitor the same network via nrpe would be redundant, so I decided to try the peer to peer distributed approach. Each nagios server has a mix of active and passive services. The active services come in transmitting and non-transmitting forms. Both systems are first configured as "regional servers" according to the standard model. Then, I add "obsess_over_host 0" and "obsess_over_service 0" to the root host and service templates on the central server. Specific hosts and service to be sent to the customer's nagios are marked with "obsess_over_host 1" (similarly for service). In nagios, "obsessing" over a host or service means to run a script with each check. For a distributed setup, that script sends the check results to another nagios server (usually via send_nsca).

I added passive-host and passive-service templates:

define host {
                name passive-host
                use linux-server
                active_checks_enabled 0
                notifications_enabled 1
                freshness_threshold 3600
                check_freshness 1
                obsess_over_host 0
                register 0
define service {
                name passive-service
                use generic-service
                obsess_over_service 0
                active_checks_enabled 0
                notifications_enabled 1
                check_freshness 1
                freshness_threshold 93600
                check_command check_dummy!3 "No passive update yet"
                register 0

and used these for the hosts and services to be check by the other nagios server.

Actually, originally they did not have the "obsess_over_host 0" (and for service) entries, and this led to my passive obsessive checking disorder problem. The symptom was that the log showed passive check results coming in continuously, with checks for the same host or service a second apart, not every 5 minutes as configured. I got frustrated and stayed up late, and finally after sleeping on it realized the problem. A passive check triggers the nagios "obsessive compulsive" behaviour the same as an active check. And this is actually a feature, because you might want to relay the passive checks on to yet another nagios server. I just need to turn off obsessing for the passive hosts and services to prevent a feedback loop between the systems.

This discussion has been archived. No new comments can be posted.

Passive Obsessive Checking Disorder

Comments Filter:

Machines that have broken down will work perfectly when the repairman arrives.