I know, I shouldn't respond to a troll, but I'm feeling generous today.
There you go again. Acting like you know what you're talking about, but you don't. ZFS and BTRFS have
Exactly dick to do with what I said. The filesystem doesn't matter. The operating system doesn't even matter.
Um, excuse me? The filesystem absolutely does matter. Traditionally, the filesystem assumes that any data retrieved from the drive has been put there, earlier. Obviously, drives don't do that 100% reliably. It's an important innovation, that these newer filesystems will add their own checksums to the data that they write, so they can detect and sometimes fix corrupted reads.
I'm talking about infrastructure and architecture, while you're blubbering on about the hardware.
Get your head out of the clouds. Everything does come down to hardware. In fact, given your other posts about hardware, I sometimes doubt that you actually interact with the hardware that you talk about.
Ideally, a RAID would be able to recreate the missing block, but I can't find any reference to a RAID doing that.
That's because you have no experience as a network administrator in a professional environment. Because then you'd know that's the very thing RAID was designed to do: Recover from hardware failure, which includes sectors becoming unreadable.
That's an aspect of software. Of course a RAID with sufficient parity will recover from a total drive failure. It's much harder to find reference to how a particular RAID will respond to intermittent errors. But if you're not just a blowhard, I'd like to see some of your links to documents describing how the RAIDs that you know will handle drive read errors. Not total failures. Just read errors.
Speaking of RAID, ZFS has its own concept of RAID that supports up to triple parity, with a different architecture than a normal storage system. Still, I haven't found any reference to how it handles drive read errors.
It surely doesn't help that modern computers have many gigabytes of memory, but almost none have ECC on that memory.
That's because ECC adds an extra layer of complexity to solve a problem that doesn't occur very often in computers, and when it does, the most severe consequence is usually that the computer crashes or behaves abnormally. For residential, and even most commercial uses, ECC memory just isn't needed. But for a select few use scenarios where data integrity is absolutely critical -- such as, say, nuclear power plants, air traffic control systems, certain types of hospital equipment, or financial processing systems, the added cost is justified because they need high availability/high reliability of those systems.
What a horrible attitude to data integrity. Computer crashes, I lose data. Computer behaves abnormally, worst case scenario is it calculates some important thing wrong, say the root of an important filesystem B-tree, and the filesystem needs to go through an expensive repair. My data are important to me. I use my computer for my personal financial processing, and I know I'm not alone. My old computer had an extra 128kB of memory to provide parity checks for the other 1MB. I imagine that stupid traditions of cost-cutting are why my new computer does not have 2GB of memory to provide ECC for the other 16GB.
Your consumer-grade computer's memory is a piece of shit. It's made with commodity capacitors and ICs that are stamped out in bulk for super cheap.
And your server memory isn't? Back up a moment... I thought OP was talking about being able to detect bitrot in family photos, and now you're telling him he should buy a server with memory lovingly crafted for high reliability? Which reliability isn't needed, because his computer is not important?
when our system shows obvious signs of a failing memory stick, we just drive to the store, plunk down a $20 and abscond with a new one. Problem solved.
Great! Assuming that any data that it touched hasn't been corrupted by that failing memory stick. Oh, but I guess your architecture is so wonderful that corrupted data cannot be stored with it. Because it doesn't use brittle software running on real-world hardware that measures chances of failure with averages and standard deviations.
I'm not optimistic about the long-term storage of electronic data.
That's because, as previously pointed out, your experience comes from consumer-grade hardware that you don't fully understand the design considerations made.
It's true that I don't have the most experience. However, others have more experience and data and they dispute your claims.
NASA has had great success in the long-term storage of magnetic media -- in fact there was an article not long ago about how they had to reverse-engineer equipment designed during the 1960s
Irrelevant. I'm 100% sure they did not recover all of the data from those tapes. Also, those were not highly compressed pictures, where a little corruption can spoil the whole picture. Those were scientific measurements, and any recovered data are useful.
So while your experiences with your personal home equipment may have led you to not be optimistic, my professional experience with industry-grade equipment suggests that, if you follow best practices regarding data storage and disaster recovery, you can ensure reliability far beyond what the OP requires for a reasonable cost.
It looks like your "professional" best practice is to make 3 copies and regularly consume a lot of bandwidth comparing them, instead of storing the data with additional metadata to check and possibly repair by itself.
But this is not to say my optimism for long-term storage has been in any way affected; that is simply industry experience and the inevitable delineation between theory and practice -- between knowledge, and experience. You have knowledge, but I have experience. My experience has given me the optimism your knowledge has been unable to.
So, your optimism is based on your experience that the OP is far more likely to accidentally delete his pictures, than to lose them in a hardware failure. I don't want to understand your line of reasoning.
The next time you want to slam someone for "acting like you know what you're talking about", don't respond with a bunch of links to Wikipedia.
Ad hominem. Besides the one link to Wikipedia, I'm linking to ACM, which possibly has people who know how computers work, and Ars Technica, just because it was the quickest reference I could find for that particular point. And Oracle, because some people there know storage, and Backblaze, because they know hard drives, too.