Quoth the wiki...
"It is estimated that the print holdings of the Library of Congress would, if digitized and stored as plain text, constitute 17 to 20 terabytes of information. This leads many people to conclude that 20 terabytes is equivalent to the entire holdings of the Library, but this is misleading because the Library contains many items in addition to books, such as photographs, maps, and sound recordings. The Library currently has no plans for systematic digitization of any significant portion of its books."
I do tend to agree with you on this perhaps a change is in order. A little off the cuff calculation is in order. Assuming 250 words per page with an average of 6 characters used per word give us approximately 1500 bytes per page in plain text. Also assuming that a 1 megabyte image be used per page instead of that 1500 bytes then we get something like the following.
20TB*(1048576/1500) ~= 14 Petabytes