Comment Concerned about bandwidth? Use a tarpit (Score 5, Interesting) 24
Back in the day, we used to run "tarpit" SMTP servers which looked like an open mail relay but ACK'd incoming packets only just barely fast enough to keep the remote client from timing out and giving up. The theory was that tying up spammer resources was a net good for the internet, as a sender busy trying to stuff messages through a tarpit was tied up waiting on your acknowledgement, reducing their impact on others.
Similarly, perhaps the right answer here is to limit the number of concurrent connections from any one network range, and use tarpit tactics to rate-limit the speed at which your server generate contents to feed the bot -- just keep ramping down until they drop off, then remember the last good rate to use for subsequent requests.
It would perhaps be interesting to randomly generate content and hyperlinks to ever deeper random URLs -- are these new crawlers more interested in some URLs or extensions than others? If you pull fresh keywords from the full URL the crawler requested, will it delve ever deeper into that "topic"? If their Accept-Encoding header supports gzip or deflate, what happens when you feed them a zip-bomb?