The British Library also maintain an archive. The FAQ relating to their crawler is quite an eye opener:
: Do you respect robots.txt?
: Can I stop the crawling by using robots.txt or blocking your IP?
: Adding our crawls to robots.txt will stop further crawling once we reconsider the file (see above). Similarly, blocking our IP will stop all further access from that IP address. However, the British Library and other deposit libraries are entitled to copy UK-published material from the internet for this national collection. If you disallow our crawler or block our IP, you will introduce barriers to us fulfilling our legal obligations.