There haven't been any algorithmic breakthroughs in many years for most of the computer science field? I find that hard to believe. Back in 1993, I was taking a graduate level course in algorithms, and the professor told us about at least one algorithm for multiplying ridiculously large matrices had been developed and published within the prior year (maybe it was 2 years at that time) by a Russian PhD. Granted, this particular algorithm didn't provide a speed benefit over other techniques until you hit matrix sizes of a million by a million, something on that order. But that's not the point.
You'd also be amazed at what effect seemingly insignificant choices in the implementation of an algorithm can make. The most extreme case I ever saw like that was something like a factor of 2 difference in speed. You might chalk that up to bad coding, but when that code is locked away inside libraries that ship with the language, rank-and-file developers might get stuck with a suboptimal implementation. So it's not just the algorithms themselves that yield new wins, it's careful analysis and improvement of older implementations.
Getting back to the topic of this article, I want to point out that I actually used NIO in a project in a corporate environment, and it seemed to give us wins in stability, thread utilization, and memory consumption, among other things. For the environment, it was probably the right choice. Had we been dealing with a newer Linux environment, or a less heavily loaded server, I suppose going with the "old" pre-NIO way of doing socket I/O would have been better.
Had we been on a Solaris system, I'm told the NIO way would have been the best choice hands down, but the company was moving away from Solaris. Still, this raises a valid point -- in the end, you need to tune your code for the environment in which it's running. So if the OS can't do threads well, the whole thread-per-socket approach will stink compared to select-based semantics. I haven't fully read the PDF yet, but it seems like most of this testing was done in Linux. Results for other OSes are not guaranteed to be the same.