I manage a couple of petabytes worth of disks (consumer, not enterprise) for the HPC center at Vanderbilt University, and they get absolutely hammered by CMS-HI users 24/7/365. At scale, you will daily see problems that you would never even think of.
The firmware on consumer hard drives is often crap. Very few of them support TLER, we have ~400's drives (Seagates) that needed a firmware fix to prevent sudden death but the fix wouldn't work en bulk over the SAS controller so we had to yank/flash/replace/repeat, and drives will occasionally lock up hard and require a power-cycle.
Don't believe for a second that Linux doesn't need a defrag utility. We were mystified by a sudden influx of permanent drive *slot* failures. After *much* investigation, it turns out that our users were filling them 100% full, erasing 5%, refilling, erasing 5%, etc, until the average file (~100 MB) had thousands of extents. The vibration from the head frantically scanning the disk to read the file was enough to cause the SATA connector to destroy the connector on the backplane (Supermicro chassis, would *NOT* buy again, Chenbro is the way...) We wrote a simple defrag script that simply copied the worst files to a different location and then move them back.
RAID5 isn't nearly sufficient at this point because you will eventually have two or more simultaneous failures just due to the number of disks. We wrote our own filesystem to offer Reed-Solomon-6+3 redundancy.
I'd love to know if you guys have any similar "WTH" horror stories.