Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
User Journal

Journal Journal: Passive Obsessive Checking Disorder

In the standard distributed monitoring described in the Nagios docs, check results flow one way - from leaf to root. I needed something a little different - peer to peer distributed monitoring. There are several problems that drive this need. One is checking public services. Our nagios server runs in the same tiny backroom "data center" as the public web server. It can check things internally, but can't check that the general public can actually get to hosts and services that are up internally. Another is that we have larger customers who manage their own network. Our nagios server rightly does not have direct access to all the internal services that need to be checked.

One way to handle both problems is to use nagios remote plugin execution (nrpe) to run the problematic tests on another server. However, our larger customer also wants his own nagios server, with only his network. Having our server monitor the same network via nrpe would be redundant, so I decided to try the peer to peer distributed approach. Each nagios server has a mix of active and passive services. The active services come in transmitting and non-transmitting forms. Both systems are first configured as "regional servers" according to the standard model. Then, I add "obsess_over_host 0" and "obsess_over_service 0" to the root host and service templates on the central server. Specific hosts and service to be sent to the customer's nagios are marked with "obsess_over_host 1" (similarly for service). In nagios, "obsessing" over a host or service means to run a script with each check. For a distributed setup, that script sends the check results to another nagios server (usually via send_nsca).

I added passive-host and passive-service templates:

define host {
                name passive-host
                use linux-server
                active_checks_enabled 0
                notifications_enabled 1
                freshness_threshold 3600
                check_freshness 1
                obsess_over_host 0
                register 0
}
define service {
                name passive-service
                use generic-service
                obsess_over_service 0
                active_checks_enabled 0
                notifications_enabled 1
                check_freshness 1
                freshness_threshold 93600
                check_command check_dummy!3 "No passive update yet"
                register 0
}

and used these for the hosts and services to be check by the other nagios server.

Actually, originally they did not have the "obsess_over_host 0" (and for service) entries, and this led to my passive obsessive checking disorder problem. The symptom was that the log showed passive check results coming in continuously, with checks for the same host or service a second apart, not every 5 minutes as configured. I got frustrated and stayed up late, and finally after sleeping on it realized the problem. A passive check triggers the nagios "obsessive compulsive" behaviour the same as an active check. And this is actually a feature, because you might want to relay the passive checks on to yet another nagios server. I just need to turn off obsessing for the passive hosts and services to prevent a feedback loop between the systems.

User Journal

Journal Journal: "Green" drives fubar servers 1

Laptop hard drives have long come with power saving features. This makes sense for laptops, which are generally single user systems. I just had the misfortune of installing a pair of "green" WD5000AADS-00M2B0 drives in a server. I soon noticed the problem of rapidly rising Load_cycle_count acknowledged at the WDC Faq.

The fundamental problem with these "green" drives is that they assume a single user system. This was an OK assumption for laptops, but it is rather annoying for a desktop drive. I suppose a desktop can be single user, but I guess we have carefully buy "server" drives instead of "desktop" drives now. Just like you have to buy a "server" desktop to get ECC. While the WDC suggestions to tune logging and setting laptop_mode for linux (which they don't mention) can produce periods of inactivity for a single user long enough to be compatible with "IntelliPower", they are ineffective on a server with multiple virtual machines, or on a SAN server, with many clients, or even on a busy email server.

For laptop drives, power saving could be disabled on linux via "hdparm -B 255". This doesn't work for the new "green" desktop drives. The inactivity timer of this model seems to be set at 8 seconds, so I wrote a simple C program to read a sector from each drive every 8 seconds in O_DIRECT mode (to bypass caching). WDC provides a DOS utility to adjust the inactivity timer - setting a very high value effectively disables it. Unfortunately, these drives were in the field before I noticed the problem.

User Journal

Journal Journal: Obstack and embedded allocation

The old libg++ library had a class called Obstack, which implemented multiple LIFO storage arenas. I have found it invaluable, and brought it forward into the STL age.
User Journal

Journal Journal: Have you tried caffeine?

A lady at my church who works professionally with ADD/ADHD kids reports that she has found caffeine with no sugar to work as well as more expensive drugs in some (many) cases. It is certainly safer and worth a try when you get a chance. No soda - unsweetened tea or coffee. Aspartame often has its own side effects. I will ask her about dose if you are interested.

Slashdot Top Deals

A freelance is one who gets paid by the word -- per piece or perhaps. -- Robert Benchley

Working...