Comment PRTG is the most cost effective and feature rich (Score 1) 137
So about 7 years ago I tested out Nagios, What's Up Gold, Cacti, Zabbix, SolarWinds Orion, and a variety of other software monitoring solutions and the problem that we had for almost all of them is that they required heavy customization or that they were incredibly expensive when they included more initial customization regarding device discovery, included templates, etc. (a la SolarWinds). We finally settled on PRTG (www.paessler.com) because it had some of the industry standard devices templated already in a basic fashion, has an easy to use interface, and has the ability to be heavily customized.
Another feature that we were really needing was remote monitoring for our customers as we are an MSP. All Remote Probe agents with PRTG will create an encrypted SSL tunnel between Remote Probe and your core server installation at your office or colocation. This requires no customization at all excepting if you are denying certain ports outbound from the probe server in which case you simply need to allow port 23560 (or whatever you've customized it to) outbound to your core server's public NAT IP). This does not give you remote control of servers necessarily but it does provide a channel for all locally monitored data to be sent upstream to your location without requiring an OpenVPN or anything like that (except if you wanted remote access you could have PRTG's remote probe piggyback across there as well and you would then also have the ability to remote control). You can deploy as many remote probes as you would like and can therefore centralize all your monitoring data as well as create reports, custom maps, and even provide customer access via nested Access Rights dependencies.
One thing I will mention - SNMP trap monitoring is a wasted effort. I know there are many proponents of it out there but if you are not actively polling your data and gathering graphable results then you have no troubleshooting abilities, no trending reports, no data utilization analysis for service management, etc. You should configure templates for your devices to standardize them and monitor all of your critical data actively so can then use the historical information to say "Ok...this server just went down - why? Check CPU utilization - OH it looks like all cores on this CPU jumped to 100% CPU utilization just before this device went unresponsive. Let me check my individual process utilization - OH there's the process causing the problem." Troubleshooting done. Imagine receiving a trap for this device - if the device is already unresponsive by the time the trap is sent, the trap never reaches your monitoring server and everything is still hunky-dory. You may also have ICMP monitoring in place so you know the device is offline but is the ISP down? Is some LAN resource like a Router/Firewall/Switch down? Is the server down? Why? Most of these questions can be answered by historical monitoring data and I cannot say enough that SNMP traps are useless 95% of the time.
For validation of my claims & experience with SNMP, I am a Principal Network Engineer for an MSP in LA for over 9 years and we currently operate a PRTG install for our MSP customer monitoring with over 18,000 sensors monitored actively, polled every 30 seconds.
Another feature that we were really needing was remote monitoring for our customers as we are an MSP. All Remote Probe agents with PRTG will create an encrypted SSL tunnel between Remote Probe and your core server installation at your office or colocation. This requires no customization at all excepting if you are denying certain ports outbound from the probe server in which case you simply need to allow port 23560 (or whatever you've customized it to) outbound to your core server's public NAT IP). This does not give you remote control of servers necessarily but it does provide a channel for all locally monitored data to be sent upstream to your location without requiring an OpenVPN or anything like that (except if you wanted remote access you could have PRTG's remote probe piggyback across there as well and you would then also have the ability to remote control). You can deploy as many remote probes as you would like and can therefore centralize all your monitoring data as well as create reports, custom maps, and even provide customer access via nested Access Rights dependencies.
One thing I will mention - SNMP trap monitoring is a wasted effort. I know there are many proponents of it out there but if you are not actively polling your data and gathering graphable results then you have no troubleshooting abilities, no trending reports, no data utilization analysis for service management, etc. You should configure templates for your devices to standardize them and monitor all of your critical data actively so can then use the historical information to say "Ok...this server just went down - why? Check CPU utilization - OH it looks like all cores on this CPU jumped to 100% CPU utilization just before this device went unresponsive. Let me check my individual process utilization - OH there's the process causing the problem." Troubleshooting done. Imagine receiving a trap for this device - if the device is already unresponsive by the time the trap is sent, the trap never reaches your monitoring server and everything is still hunky-dory. You may also have ICMP monitoring in place so you know the device is offline but is the ISP down? Is some LAN resource like a Router/Firewall/Switch down? Is the server down? Why? Most of these questions can be answered by historical monitoring data and I cannot say enough that SNMP traps are useless 95% of the time.
For validation of my claims & experience with SNMP, I am a Principal Network Engineer for an MSP in LA for over 9 years and we currently operate a PRTG install for our MSP customer monitoring with over 18,000 sensors monitored actively, polled every 30 seconds.