I looked at many different commercial options here at work and purchased EMC Smarts. The root cause analysis is very helpful. It has saved us a lot of time tracking down some outages we've had here. It can tell you, for instance, that a specific port on a switch is down or flapping which is causing problems.
Most of the other tools we looked at would tell you that all of the servers at a remote facility was down but Smarts will take it one step further and identify the root cause so you do not spend time figuring out if one of the routers on one side or the other is down, if it is the link itself, a firewall, etc. It is all information you could tell on your own but none of the other tools even went to the detail necessary to track the problem down using only the information presented in the tool. Smarts goes even further and specifically points at the problem point.
There are a bunch of other modules you can buy to help you automatically model application/system dependencies to find out which business units are impacted by an outage, what systems/applications would be impacted by a DB outage, etc. Other modules can track application performance in all of the steps from workstation all the way through the network into each server and DB using just network monitoring or through synthetic transactions.
It is not cheap by any stretch of the imagination but implementation is fairly easy with its autodiscovery providing huge value. If you want to use it to its fullest, it will take some learning and a bit of time from a good administrator.
Before anyone asks, I was unsuccessful getting open source tools seriously considered. I had implemented OpenNMS very successfully and was using unofficially to monitor for outages and track system availability.
Doing this right is not a light task if you want all of the detail necessary to properly manage a large-scale network. We getting into the level of detail of monitoring server memory utilization, disk space utilization, CPU, switch/router port utilization, etc. It is taking at least one full-time administrator just to manage it. I wish you luck on developing a new tool. I would encourage you to look at some of the existing tools before trying to build your own. Just implementing a tool that has been around a long time is a huge process. Building and implementing....