Comment sloppy averaging (Score 1) 270
the simplest explanation is that averaging is done in different places. using the top line from above: 2.4% is the density value (repeats/words) averaged over the 2000 first ranked pages; while words and repeats are total words/2000 and total repeats/2000. *if* this is the case, density is more useful than repeats/words. just consider an outlier page that is 100% repeats: it will boost the repeats/words ratio by a huge amount if the page is, say, one million words long, but the density number wont be affected---its just averaged in with the other 1999 as one data point of 100%, totally regardless of page length.
notwithstanding, this article is a pretty half-ass "analysis"....