Comment Use the Archive's crawler (Score 2, Informative) 29
How about using Heritrix, the Internet Archive's open-source, extensible, web-scale, archival-quality web crawler?
In seeking the unattainable, simplicity only gets in the way. -- Epigrams in Programming, ACM SIGPLAN Sept. 1982