So when you're pushing data as fast as you can through a socket, the old read(byte[]) or write(byte[]) are faster? Wow, no kidding.
You do NOT use java.nio (like Jetty's SelectChannelConnector) for maximum throughput. You use it to handle persistent connections, like all those long polling requests via AJAX which return on an event or timeout after a minute. This article is like recommending Apache with its hard limits on how many requests it can serve concurrently over newer, asynchronous servers like Nginx for static media servers with keep alive enabled.
The slides even mentions the
C10K problem, but what it doesn't do is mention when to use either technology - async IO for concurrency and endless scaling, and synchronous IO for pushing a 10G Ethernet link to the limits. No wait, the nio setup can do that too, 700MB/s or 5.6Gbit/sec per core on 2008 hardware should be enough to max out anything you can buy now. It's great that synchronous IO can hit 1GB/s, a whopping 30% faster, but useful? I'd say no.
For most users, you don't use either API. Lets be honest here, writing highly concurrent software is hard, why reinvent the wheel when you can get off the shelve software that can do it better? You use Jetty and choose between the SelectChannelConnector or SocketConnector, or choose between Apache or Lighttpd/Nginx depending on the traffic pattern. What you do write is the bit that accepts a whole HTTP request and returns a HTTP response, everything before and after is magic.
Unless you're a file server, each 50k sized HTTP response will require enough work to make sure you run out of CPU or Disk IO long before you hit even the 100Mb/s ceiling in most rack switches. Even if your app is fast, 16 cores x 100ms per request x 50K is only 62 Mbits. Not 5600.
But if you need to scale in concurrent client count, there's no way around async IO. The latest name to watch is Netty. In
Plurk Comet: Handling 100,000+ Concurrent Connections with Netty, it scales up to 100000 concurrent connections on a quad core server with 20% CPU load.
Just stop worrying about sockets already, and start worrying about your SQL server suffering a meltdown. Even if you get manage to grow into the Facebook, it's not like using synchronous IO will save you from deploying
30000 servers, it's the application code that's slow. Zero copy, one copy, "string concatenation style twenty copies response building" socket writes don't matter at all, memcpy is cheap compared to a few lines of interpreted code, servers are cheap compared to developers, and never mind the cost of the programming gods giving these presentations.