Comment Re:Doesn't that kinda defeat the point of the arch (Score 5, Informative) 234
I apologize for my mistake. Until just a few minutes ago, I was unaware that the Internet Archive agrees to RETROACTIVELY honor a robots.txt file. So once a robots.txt file restricts access to content, they voluntarily remove access to previously archived content from the archive. Here's the related item from their FAQ:
Some sites are not available because of robots.txt or other exclusions. What does that mean?
The Internet Archive follows the Oakland Archive Policy for Managing Removal Requests And Preserving Archival Integrity
The Standard for Robot Exclusion (SRE) is a means by which web site owners can instruct automated systems not to crawl their sites. Web site owners can specify files or directories that are disallowed from a crawl, and they can even create specific rules for different automated crawlers. All of this information is contained in a file called robots.txt. While robots.txt has been adopted as the universal standard for robot exclusion, compliance with robots.txt is strictly voluntary. In fact most web sites do not have a robots.txt file, and many web crawlers are not programmed to obey the instructions anyway. However, Alexa Internet, the company that crawls the web for the Internet Archive, does respect robots.txt instructions, and even does so retroactively. If a web site owner decides he / she prefers not to have a web crawler visiting his / her files and sets up robots.txt on the site, the Alexa crawlers will stop visiting those files and will make unavailable all files previously gathered from that site. This means that sometimes, while using the Internet Archive Wayback Machine, you may find a site that is unavailable due to robots.txt (you will see a "robots.txt query exclusion error" message). Sometimes a web site owner will contact us directly and ask us to stop crawling or archiving a site, and we endeavor to comply with these requests. When you come accross a "blocked site error" message, that means that a siteowner has made such a request and it has been honored.
Currently there is no way to exclude only a portion of a site, or to exclude archiving a site for a particular time period only.
When a URL has been excluded at direct owner request from being archived, that exclusion is retroactive and permanent.