Comment JBIG2 (Score 1) 290
I haven't even read this article and I know the culprit exactly: JBIG2.
The compression algorithm operates on binary (2 color) images and has two modes, a lossless mode which is sort of like the love child of RLE and JPEG and a higher compression mode which operates by running the lossless blocks through a comparison routine and discarding and replacing any blocks that are sufficiently similar with references to the first copy. It's actually a good algorithm, but you have to understand how it works to implement it properly. When you have a perfect storm of certain fonts (especially small ones where a glyph can fit perfectly inside a block), have some noise in the bitonal images and have the compression threshold too high you can get some real zingers.. 9, 6, 0, 3, and 8 can all easily get muddled up, not to mention what happens to letters like e o c etc. The key to the whole thing is having good algorithms that can produce quality bitonal images from poor originals and scanning at sufficient resolution (or lowering the compression threshold enough) that blocks cannot hold an entire glyph.
As to why the copier is using the lossy mode of JBIG2 internally is mystery, especially in the "copy" pipeline. I can think of no good reason that it should use anything other than the lossless mode or uncompressed data.