Comment Re:Could this break the computer misuse act?

The British Library also maintain an archive. The FAQ relating to their crawler is quite an eye opener:


: Do you respect robots.txt?
: As a rule, yes: we do follow the robots exclusion protocol. However, in certain circumstances we may choose to overrule robots.txt. For instance: if content is necessary to render a page (e.g. Javascript, CSS) or content is deemed of curatorial value and falls within the bounds of the Legal Deposit Libraries Act 2003.

: Can I stop the crawling by using robots.txt or blocking your IP?

: Adding our crawls to robots.txt will stop further crawling once we reconsider the file (see above). Similarly, blocking our IP will stop all further access from that IP address. However, the British Library and other deposit libraries are entitled to copy UK-published material from the internet for this national collection. If you disallow our crawler or block our IP, you will introduce barriers to us fulfilling our legal obligations.

