"As with most mistakes, it is part of a system that is faulty and awaiting one simple mistake to escalate."
Can't agree any more.
"Chances are there was a culture of trying to save money"
Sometimes the "cargo cult" is so ingrained that even the techs are unable to see it.
Was in a hiring process, not remember if it was Google or Amazon. One of the questions (from a hands-on tech team lead) was about a single server that went crazy and couldn't spawn any more processes, so it was almost impossible to do nothing with the computer. It still was offering whatever services it hosted just OK.
It went more or less like this:
Me: Has this happened before?
Me: So... Can I try this, or that, or this other one?
R: No, because you can't run any new process.
M: Ok, reboot it (I of course know saying somehting like that is taboo for a unix/linux sysadmin). Let's look at the booting messages to see if we get some clue and let's monitor it afterwards to see if this happens again. If that's the case, we will be in better position to diagnose, if not, we will put it on the "computer gnomes" account.
R: Won't try to diagnose anymore before rebooting?
M: Nope. My time is valuable and there will surely be more productive things on my to-do list.
R: But the computer host a service that if turned off will cost the company a bazillion!
M: Nope. If that were the case, the powers-that-be would have engineered the service with high avaliability in mind -which in turn means we could reboot the server without further hesitation. Since that's not the case, the implicit is that business already considered it not a critical service so point above about me costing money still applies.
R: But, but, but...
Of course, I knew from the very begining the answer he wanted was to find a way to list the process list without spawning a new process so after a while I went throw that route -I vaguely remember there was some Bash built-in that would allow me to do it, but not exactly which one, but back in that time I wanted to see the culture of that place.
There's no need to say I wasn't hired. But I didn't wanted to be hired either. Not within that team at least.