"Switch providers. Plenty offer remote reboot and serial console or KVM for both VMs or physical servers, which would allow you to go crazy with custom encrypted partitions etc."
They offer KVM access, at $35.00/day, which in this case I refuse to pay to fix what they broke, outside of the context of the server. They migrated me from one chassis to another with completely different hardware, causing my machine to go offline. They want me to pay $35.00 for 24-hours of KVM access to reconfigure the network to support the hardware they moved things to.
Alternately, they want me to hand over the root password (not a privileged account, but THE root password), so they can do it themselves. Since I installed, configured and manage the OS entirely on this machine, and they've demonstrated their ineptitude before, I'm not giving them root. Ever.
"I'd also like to know how you *know* it's a hardware or network issue outside of your server. How do you know it's not your NIC driver hanging up? Older e1000 drivers (super common card in the hosting industry) are quite flaky. What research have you done outside of your internal monitoring?"
Because this server has been running 24x7 for about 3 years without a single outstanding issue. When they migrated it from Savvis to some datacenter in Dallas 2 months ago, I've had no less than 20 separate outages , while the underlying OS and application stack itself has not changed in any way to facilitate those outages.
In every single case, they demand that I give them the root password, so they can diagnose the issues on the machine. In every single case, I've shown them nagios, ntop, hotsanic, sar, etc. logs demonstrating that the OS itself is not the cause of the outages.
For example, since this migration to Dallas, every other Sunday between 7:00am and 8:00am EST, my server's load goes over 100 as incoming connections spike over 700/sec., sendmail refuses connections due to the load, and the box seizes up. The logs show that the connections are established and then hang. NOTHING on the machine triggers every other Sunday between these hours that would cause that.
Only a few days ago, they indicated that the NIC on the server may be causing the issues. I'm down 2-3 hours every other Sunday because of this.
They're not asking for the logs, they're asking for root. That's a completely separate (and unacceptable) solution to their own problems outside of the box itself.