We prefer monitoring checks that are on a business-relevant level. If a process runs or not -- that's what systemd is telling us -- is irrelevant for our level of monitring. It might be a first stage, but that should be obsoleted by proper monitoring conditions. We need monitoring checks that tell us if an account can be opened, if an order can be plaed. Monitoring needs to tell if the business is running. Technical terms like daemons have a rather minor place in this. The real test: can the customer do the things we want him to do.
No customer of us wants to know if our JBoss cluster is running. What they want to know if orders could be placed via the application that's running on our JBoss cluster. And it's our damned professional obligation to provide that information, and not hide behind the excuse "JBoss was not running".
Proper monitoring, as I think about it and as we practice it, is about business-relevant data. It's not about a daemon runnning on one system. It's about "how long does a customer wait to get a dialog served to order a system. Or, "how long does it take to deliver the promised system to the customer." So we create and change new systems, to see how long it takes. If it takes too long, we establish new instances to make that workflow go faster. That's, IMNSHO, is what cloud computing is about: atomatic attaching *and* detaching instances of standardized instances, that are never touched manually, to realize the perfornamce demand of our customer.
I don't demand cloud-like infrastructure recoginition in this discussion (though I'm most familiar with it). But standard virtualized data center environments already show the problems I'm talking about.
Don't get me wrong: I actually like systemd. My probem is that some of its proponents try to sell it for tasks that it has never been made for and will not deliver it. E.g., proper monitoring, a.k.a. business-relevant delivery of information about services
Thinking about it, your might have found a hole in the setup that I deliver to our clients. Folks might have setup daemon-process-based monitoring and left it at this. Grmmbl. Seems we have to detect this low-level monitoring, to escalate it to a proper monitoring in our infrastructure. Thanks for this insight.