I know somebody that tried this. At around 5000 threads he got no real progress whatsoever anymore.That was a while back, but at that time Java was already a few years old.
The thread limitation comes from the operating system, not from the Java virtual machine. Modern operating systems are not designed to handle a huge amount of parallel threads. The handling of the threads and the synchronization between the threads usually eats up most of the system's resources.
The Java VM indeed has some shortcomings regarding mutli processing: when using multiple cores on the same socket, the Java VM sometimes accesses the same cache lines from all cores. This leads to strange patterns with cache invalidation and slows down all the affected cores. Currently there is no way to mitigate this using Java APIs or VM parameters. But this is a very special problem. When this is indeed the bottleneck, the underlying application is very likely already very well optimized and running quite fast.
If you want to process a huge amount of data, currently the best approach is to run exactly as much threads as you have processor cores. Then you feed each thread with working tasks, using non blocking data structures. For the communication, the threads should use non-blocking IO. Java is prepared extremely well for this scenario, perhaps even better than node.js. Examples are Vert.x and Akka. In some older benchmarks, Vert.x had no problem to serve over 300'000 parallel requests per second on a six core machine.
Edited on on 18:05 Wednesday 17 September 2014: fixed some typos.