Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

 



Forgot your password?
typodupeerror
×

Server Monitoring With Munin And Monit 124

hausmasta writes "In this article I will describe how to monitor your server with munin and monit. munin produces nifty little graphics about nearly every aspect of your server (load average, memory usage, CPU usage, MySQL throughput, eth0 traffic, etc.) without much configuration, whereas monit checks the availability of services like Apache, MySQL, Postfix and takes the appropriate action such as a restart if it finds a service is not behaving as expected. The combination of the two gives you full monitoring: graphics that lets you recognize current or upcoming problems (like "We need a bigger server soon, our load average is increasing rapidly."), and a watchdog that ensures the availability of the monitored services."
This discussion has been archived. No new comments can be posted.

Server Monitoring With Munin And Monit

Comments Filter:
  • Cacti (Score:5, Insightful)

    by mtenhagen ( 450608 ) on Sunday May 07, 2006 @11:19AM (#15281249) Homepage
    How is this different from cacti? [cacti.net]
  • by Erik Hensema ( 12898 ) on Sunday May 07, 2006 @11:45AM (#15281349) Homepage
    • A restart usually kills hanging processes, making the actual cause of the hang impossible to determine afterwards.
    • Automatic restarts make some admins lazy. Instead of debugging the problem, they accept apache/whatever service is restarted once a day.

    However, making graphs and monitoring your services is a very good thing. Graphs are invaluable in determining trends, such as memory leaks or steadily increasing load. Monitoring saves lots of downtime and unhappy customers ;-)

    Personally I use nagios for monitoring and DIY scripts for graphing. The latter mostly because I started making graphs before decent of-the-shelf software was available ;-)

    PS. what's this subject got to do with debian?

  • by fimbulvetr ( 598306 ) on Sunday May 07, 2006 @11:49AM (#15281367)
    It always bothers me when people use utilities to restart services that die/have been killed. Shouldn't a daemon be designed to run indefinitely? Doesn't the fact that a process died mean that something is wrong and needs to be fixed? For instance, if my apache daemon dies because the logfile is larger than it can handle, what good is restarting it going to do? It's just going to beat the crap out of a server - process dies - watcher daemon starts it up - process dies...etc.
    Or, if the OOM killer kills my ftp server because he's hogging the memory, doesn't that mean I have bigger problems than just doing a restart(I need more memory, the ftp server has a mem leak, etc)?

    None of my hundreds of critical daemons die for no reason whatsoever - all of require some type of human interaction if they have died. It doesn't happen very often, maybe once every several months.

    Not that I care about this software in general, I use hobbit for my trending/graphing/service availability, but I hate to see bad admin'ing, even if I'm not involved.
  • by Jeff DeMaagd ( 2015 ) on Sunday May 07, 2006 @11:54AM (#15281382) Homepage Journal
    Point taken, but I think an automatic restart is necessary to minimize intrusions into off-work-time with maintainaince and such. If the service hangs and there's no one there to tend to it, then it will stay hung until someone notices. This is not good if you want to keep going and not lose potential business if the site is down.

    Anyway, I'm glad I'm not a server admin. I'd like to live my private life NOT being on-call.
  • Orca (Score:3, Insightful)

    by otisg ( 92803 ) on Sunday May 07, 2006 @12:04PM (#15281418) Homepage Journal
    I'm a happy user of Orca [orcaware.com], which I use to graph all kinds of aspects of the system that runs Simpy [simpy.com]'s cluster.
  • by routerguy666 ( 926506 ) on Sunday May 07, 2006 @12:21PM (#15281476)
    I've tried a number of these monitoring apps as they've come out. To date, I still can't find a combination better than MRTG and Nagios. If you know a bit about SNMP and how to find the OID of what you are interested in (and where to get mibs), it's hard to find a simpler, cleaner pair of monitoring products.

    Although in all honesty, Nagios' only real benefit is the ability to send out alerts. I'm more fortunate than others, I know, in that I've had the resources available to build redundancy in at every level of our production networks so when something does die (and with modern platforms this is becoming a once every two years event) it doesn't create a major catastrophe.

    Other than that, all the trending info I want/need on bandwidth, cpu, disk space, user loads, etc, etc, I can pull out of any device via snmp and track it with MRTG. Plus each MRTG release doesn't require me to rewrite umpteen config files to match the author's latest greatest idea of how they should be formatted (my only real gripe about nagios/netsaint).

    In the end I guess you use what you are familiar with, and I cut my teeth on these.
  • by Burv ( 637312 ) on Sunday May 07, 2006 @12:29PM (#15281511) Homepage
    Good points. However, I think there's something to be said about automating things to increase uptime and lessening the load on the sysadmin, especially if it's critical that the service be available and you always go through the same checks (e.g. check /var/adm/messages, run look at the process table, load, etc.) that you go through. There's also a tradeoff in knowing details of what caused the problem if every minute your server is down, your company is or could be losing money, like for someplace like ebay.

    Oh, and I think these packages are installed as part of debian, either by default or optionally. That's why the article mentioned apt-get.

  • Very nice! (Score:2, Insightful)

    by ngunton ( 460215 ) on Sunday May 07, 2006 @03:24PM (#15282047) Homepage
    I hadn't heard of this before. I liked the sound of pretty graphs, and I particularly liked how easy the article made it sound to install and get things working. So I tried it (I'm running Sarge AMD64 on the server) and it worked fine. In fact, it was up and running in a couple of minutes. Very nice!

    I have to say it is refreshing to see something that "just works" out of the box with sensible defaults. Truth be told, I am sick and tired of these holier-than-thou OSS zealots who keep pushing bloated, complex toolkits which have every option under the sun, but it doesn't all "just work" out of the install, no, that would be too easy wouldn't it. You have to read through reams of distributed, fragmented documentation, forum posts and other sources to get the damn thing working properly, not to mention cobbling together all these !@#$ing plugins that are sooooo wonderful and yet just end up being a pain in the butt because you have to track them all down individually. Why can't geeks grasp a simple fact: People don't necessarily have the time or inclination to spend days learning the arcane innards of your toolkit. I don't care if people say "well if you can't be bothered taking the time then you're not a real admin" or whatever, if I had to spend a lot of time on every package tuning it and writing a sendmail.cf-esque config file just to get it working *the way it should by default* then I'm probably just going to look for something else. That something else may be simpler and not as "pure" as your baby, but you know what? I'll use it, because it *just works* and does *most* things in a simple intuitive way. That's why MySQL became successful, and why PostgreSQL didn't - sure, PostgreSQL was more powerful (in theory anyway) and had a bunch more features, but it isn't optimized out of the box. Whenever I see people complain about how slow PostgreSQL turns out to be when they finally try it, the inevitable reply is "Well, you need to spend time tuning it - if you don't do that then you don't deserve to be running a server". Whatever. As far as I'm concerned these "Tuning required by default" and "You aren't a *real* x if you don't learn these reams of config options just to get it working" people just don't get it. Make it work out of the box with sensible defaults, and let people delve into stuff further *if they want to*, not by requirement.

    I think the snobs are like this because they did go and learn all that stuff, and so they feel deep down that they have to justify that it was all worth it by putting down those who have a life and don't feel like dedicating days and weeks of effort to getting some stupid software package to function in the most basic way.

    So, great job Munin. My hat is off to you - I have a graphical monitoring system for my server, and it took me about two minutes to get it working. Fantastic.
  • Damn Straight! (Score:1, Insightful)

    by Anonymous Coward on Sunday May 07, 2006 @03:35PM (#15282084)
    I'm with you on that one. I just can't understand why so many people keep re-inventing the wheel rather than simply learning a bit of SNMP. SNMP and its tools provide all of this functionality and more. Why does everyone keep doing their own protocol and server and agent software? There are already several standard methods for handling this via DMTF WEBM, CIM and good old SNMP. Also, why are so many people willing to run agents from obscure packages that are likely full of bugs and certain to be abandoned in the not so distant future? Why can't we just have more SNMP agents and instrumentation?

    Some people deride SNMP over its security issues but, how is the security of all these funky apps and agents any better? Additionally, even with SNMP security being as "weak" as it is claimed to be, it has yet to create a significant problem. Yes, there have been some scares when vulnerabilities were discovered but, the internet has yet to collapse because of scary old SNMP.

    The last thing I want to do is add yet another flaky process to my systems. It's pretty embarrassing when your monitoring agent brings down the server! Or your management console decides to poll it to death! SNMP is almost always already there and running, why not just leverage it?

    P.S. Yes, I know that Munin can use SNMP but, that is a side note and not its primary operating mode.

"More software projects have gone awry for lack of calendar time than for all other causes combined." -- Fred Brooks, Jr., _The Mythical Man Month_

Working...