very common for multiple drives in an array to fail within a short time window, due to shared environmental problems
Exactly. We had one interesting incident where in the middle of the night, 3 pods right next to each other in a rack all went berserk and all their RAID fell apart. That's 135 drives all at once (3 pods each with 45 hard drives). We reassembled them all, and the VERY NEXT NIGHT at the same time it happened again. We moved all three servers to different ends of the datacenter -> and finally figured out which server was causing the problems. The fan bearings on a fan were going bad, and when the fan came on it vibrated the entire cabinet. We have "nightly cleanup" jobs that run to verify data integrity and delete files we no longer want, this was enough load to cause the CPU to heat up enough to trigger the bad fan.
Trying to read a damaged sector is less reliable than reading the undamaged redundant copy.
You're thinking about it wrong. You always want the maximum amount of information from every drive, you can choose to use that information however you like, I don't want "Enterprise" drives that won't try hard to get every last bit.
Here is an example: We have had problems reassembling / resyncing RAID arrays because one stubborn drive pops out and fails too easily (we run two parity drives - so if you are already down 2 drives a 3rd stubborn drive is a bummer). If the drive would just stay in and try harder, we could get through that particular operation. Backblaze then adds it's own end-to-end SHA-1 on every file - trust us, we'll absolutely know for certain whether or not we recovered the file accurately or not from that particular RAID array or not. But until we reassemble the RAID array and get the file system back online, we can't even check what we are holding. Fighting with it costs us IT time. Again-> we have no performance problems at all. I know this is hard for some organizations to grasp when you never seem to have enough IOPS. But the nature of online backup is not like the nature of your billing or account info database.
I'd happily pay 2x or 3x the money to get 20x the write endurance.
That only makes sense if you are hitting the write limits. If the drive dies because the bearings wear out after 5 years of spinning regardless of the number of writes, you have just paid 3x the money and gotten exactly zero benefit.
Enterprise drives typically range from 18000rpm at the very high end...10K rpm probably the most common for bulk storage
Backblaze pays something like $45,000 / month in our electrical bill. We vastly prefer "green" drives that spin slower and use less electricity. There are many, many "Enterprise" applications in the world that are not bottle necked on spindle speed (like backup and Shutterfly-type big-data-rarely-accessed), those enterprises deserve slower drives. I guess I object to using the word "Enterprise" to describe "Fast" - why not just mark your drive as 15,000 RPM or 7,200 RPM and be done with it? No need to add the pointless label "Enterprise Drive".
SMART reporting is much more consistent for enterprise drives
No way. All hard drives do SMART reporting. Sometimes the "bridge" between the processor and the hard drives won't pass the information, so a cheap USB enclosure might be hiding the hard drive SMART stuff from you, but that isn't the hard drive's fault. In fact, we have an expensive Dell drive shelf with an LSI (?) controller that hides our enterprise drive SMART stats from us, very annoying. There is no correlation between "Enterprise" and "SMART reporting".
some manufactures are intentionally disabling typical enterprise firmware features on the consumer models, drive commands that are helpful for hardware raid
The whole concept of RAID is that it is a software layer on top of all the cheap drives. RAID doesn't require any interesting instructions. Pretty much needs to write data to an individual drive and read it back later.
I wouldn't be surprised if usage patterns over 5-10yrs resulted in a significant divergence.
Time will prove you right or wrong, we plan on updating and releasing these numbers every few years. Stay tuned....
The only major company I know that uses consumer grade HDs in volume is probably Google
What qualifies as "major"?
"Enterprise" grade drives are often faster, having better processors and more cache
The cache is whatever is written on the drive, so a "Enterprise" drive with 32 MB of cache has less than a "Consumer" drive with 64 MB. I don't know what the heck you think the word "Enterprise" gets you in this case?
drive manufacturers have to listen to server and storage array manufacturers and meet their requirements
Different storage arrays have different requirements, I hate the idea that people think "Enterprise" magically got all the tradeoffs correct. For example, low power and high responsiveness are BOTH valid goals but probably are at odds. Some Enterprises (like Backblaze and Shutterfly) care deeply about their electrical power bill and the drives aren't the performance bottleneck. Should we buy enterprise drives or not?
Top Ten Things Overheard At The ANSI C Draft Committee Meetings: (3) Ha, ha, I can't believe they're actually going to adopt this sucker.