Server Monitoring With Munin And Monit 124
hausmasta writes "In this article I will describe how to monitor your server with munin and monit. munin produces nifty little graphics about nearly every aspect of your server (load average, memory usage, CPU usage, MySQL throughput, eth0 traffic, etc.) without much configuration, whereas monit checks the availability of services like Apache, MySQL, Postfix and takes the appropriate action such as a restart if it finds a service is not behaving as expected. The combination of the two gives you full monitoring: graphics that lets you recognize current or upcoming problems (like "We need a bigger server soon, our load average is increasing rapidly."), and a watchdog that ensures the availability of the monitored services."
Insignificanct in the trails of NAGIOS? (Score:2, Interesting)
Re:But can I run this on Windows? (Score:3, Interesting)
Dunno. Don't care either, but it might. Its based on rrdtool [oetiker.ch] which does run on Windows. I don't know if this article is a slashvertisement, or just void of information. I've linked to rrdtool, and here [linpro.no] is the munin homepage.
There are _tons_ of these things running around. In my opinion, rrdtool is one of the best tools that has come to computing in a long time. Its awesome. Other packages that use rrdtool are cricket, ganglia, and many others. I believe that the rrdtool site has a listing of some of these.
For those not familiar with it, rrdtool is a database that is designed for time series data. Its kinda like a smart FIFO where it looses details the further back in time you go by storing running averages. I have rolled my own monitoring stuff with rrdtool and perl to monitor CPU, load, temperatures, you name it. One of the cool things about rrdtool is that the database is fixed in size. rrdtool is not easy to initially set up and work with, but the effort is definitely worth it.
Basically, if your a sysadmin in 2006 and you do not have rrdtool based monitoring going on. Well, maybe the job is not for you. Its that important and good. A simple click on a link of a webpage with a rrdtool graph can demonstrate to even the pointiest of pointy PHB that you need more equipment or a trend is going on or whatever.
This is the kind of stuff I would like to see more talked about here on slashdot.
Re:Restarting services... (Score:4, Interesting)
We're discussing such issues in a class I'm taking on software fault tolerance. In discussing selective restarts and backup processes Apache is frequently cited as an example of how software should fail gracefully, consistently, and then handle that failure itself. The lecture slides can be found here: http://wwwse.inf.tu-dresden.de/index.php?language
Apache has some memory leaks in it. It is not bad, it happens, especially in a piece of software like that which is expected to run constantly and NEVER fail. So what the Apache software does is every so often, or when it detects that its memory usage is getting out of hand, it fires up a second copy of itself and then kills itself letting the new not-yet-leaky copy take over.
So to you (IT/admin) that daemon may run forever, but thats because my people (CS/developer) did our jobs (for once) and ensured that the application cleaned up its own messes.
Re:But can I run this on Windows? (Score:3, Interesting)
As for the Windows servers, the monitoring is nothing new, Microsoft Operations Manager or MOM has been around for 6 years now and is exceedingly friendly to both setup and use, also works with all servers and workstations flagging alerts like low disk space or high cpu utilization so you can see if some new virus is coming at you. They even have agents for Linux and OS X.
I'll have to check out rrdtool though, its new to me, most of the linux boxes I have in production are only doing one task and there aren't that many servers. 20 in total that I manage so its fairly easy to check availability and go over the logs real quick manually. Time is always against me but now that its summer I should have time to get my house in order.Speaking of those databases... (Score:1, Interesting)
Is there a MySQL -> PostgreSQL FAQ list out there? If not, would it be appropriate to make one in, say, Wikipedia? I have some ideas I wouldn't mind sharing with other users who "grew up" with MySQL and got used to all its particular features.
collectd (Score:1, Interesting)
If you have multiple *NIX servers to monitor, check out collectd: http://collectd.org/ [collectd.org]
The client reports various system statistics to a central collection server, which dumps the information into RRD files. Because it's a push sort of thing, there's no hassling with opening ports or running additional network accessible services on the clients. (UCD-SNMP has always made me nervous.)
Monitoring a new machine is as simple as installing collectd and pointing it at your collectd server. The server automatically creates RRD files for the new host, and you're off and running. No configuration changes are required on the server. Make yourself a pre-configured package, and monitoring a new machine is a snap.
Re:Insignificanct in the trails of NAGIOS? (Score:4, Interesting)
Packages for it are often broken or from the old 1.3 tree, which makes for confusion when following examples that use 2.0 syntax.
Configuration is extremely challenging to start from scratch with, especially if you want to do anything custom.
There are a number of external dependencies, particularly if you want to compile the plugins.
That said, Nagios still whips the pants off quite a few commercial monitoring products I've evaluated.