Comment Re:So we still have... (Score 1) 756
or we'll blow ourselves up..
reminds me of http://www.endofworld.net/
or we'll blow ourselves up..
reminds me of http://www.endofworld.net/
In fact, there are many things that are happening now which makes me think: "Are they doing this on purpose? Or are they retarded?"
Definitely retarded (see http://en.wikipedia.org/wiki/Hanlon's_razor).
In the link you posted, the admin found three uberblocks (there are supposed to be four). ZFS correctly made multiple uberblocks, per design. It appears that all three were corrupt.
ZFS keeps a history of the last 256 uberblocks in four different places in the pool. So even if all copies of the most recent uberblock got corrupted, it could still fall back to one of the older ones. You'd maybe loose the last few minutes of work, but that's not nearly as catastrophic as loosing the whole pool. It could fall back, but it doesn't, it rather panics the kernel. This is where a userspace fsck would help, for examply by giving you the choice to safely invalidate the last uberblocks. I was not feeling very comfortable when I wrote the dd/ud script that automated that task, but I had nothing to loose at that point.
The silent corruption was just an example. It doesn't matter what causes the corruption. But _if_ you end up in a situation like the admin or me, you have to resort to such ugly tricks to recover your pool. And that is something I'm not willing to accept on a production system - or any system at all for that matter.
Thus spake the master programmer: "When a program is being tested, it is too late to make design changes." -- Geoffrey James, "The Tao of Programming"