Outsourcing the reading part doesn't buy you much. If this just did a crawl, it would be of very limited value. That's not what it does.
Wrong. If I want to spider a single web site, many sites have rate-limiters that kick in and will block me after a while. This would allow me to hit it from multiple machines.
There are some security limits, which might even work. Supposedly, all the Java apps can do is look at crawled pages and phone results home. Right.
Why the sarcasm? This seems like a perfect use case for the JVM's security mechanism.
Build a system that even a fool can use and only a fool will want to use it.