4.1.1. Data Integrity in HDFS
HDFS transparently checksums all data written to it and by default verifies checksums when reading data. A separate checksum is created for every io.bytes.per.checksum bytes of data. The default is 512 bytes, and since a CRC-32 checksum is 4 bytes long, the storage overhead is less than 1%.
Datanodes are responsible for verifying the data they receive before storing the data and its checksum. This applies to data that they receive from clients and from other datanodes during replication. A client writing data sends it to a pipeline of datanodes (as explained in Chapter 3), and the last datanode in the pipeline verifies the checksum. If it detects an error, the client receives a ChecksumException, a subclass of IOException.
When clients read data from datanodes, they verify checksums as well, comparing them with the ones stored at the datanode. Each datanode keeps a persistent log of checksum verifications, so it knows the last time each of its blocks was verified. When a client successfully verifies a block, it tells the datanode, which updates its log. Keeping statistics such as these is valuable in detecting bad disks.
Aside from block verification on client reads, each datanode runs a DataBlockScanner in a background thread that periodically verifies all the blocks stored on the datanode. This is to guard against corruption due to "bit rot" in the physical storage media. See Section 10.1.4.3 for details on how to access the scanner reports.
Well, they'd better come get me then.
The only time in my life I nearly got in a fight with a cop was when I was seriously buzzed up on Jolt cola (All the sugar and twice the caffeine!) and refused to obey a transit cop.
It was NOT a pretty afternoon for me.
Nowadays, I guess they'd just throw be in prison.
Yeah I remember one program that got styles right: FrameMaker. Whenever I try to make a text document with OOO, I start cursing. For chrissake, vanilla CSS has better styling system. Don't reply, I'll go read the manual...
Without life, Biology itself would be impossible.