he article was written by the guy that did the driver, I think we can assume he knows his stuff.
Most of the driver is just a copy of Intel driver, with additional functionality bolted on top. Whatever the author's abilities are, the goal was not to produce a working protocol stack, and benchmarks of this hack can't be used to predict anything but the behavior of this hack.
No it appears that if you want to switch more than 10-18 Gbit/s the computer would have a memory bandwidth problem. Trying to use multiple cores and NUMA might improve on that, but I do not think you would manage to build a 24 port switch that switches at line speed this way :-).
But if you could somehow get an external switch to do 99% of the work, this might work...
And then they would inevitably slow down this hack, too, what makes me doubt the validity of the measurements.
I am not sure how much more we can get out of this discussion. From my side I believe you are going too far in trying to make a problem out of something that actually works quite well for some very large companies (Google and HP!).
Those companies merely announced that they intend to use this "technology" somewhere. They are not throwing out the routers they have. They likely replace some level 2 and level 3 switches ("almost routers") and treat the whole thing as a fancier management protocol for simple mostly flat or statically configured networks that they have in abundance. For all we know, Google may already have no routers at all except for links between their data centers, as they are famous for customizing their hardware/network infrastructure for their own unique software infrastructure, and would probably gain more from multi-port servers connected by very primitive switches into clusters with VLAN or even physical topology following the topology of their applications' interfaces.
Packets need to be delayed when the controller needs to be queried and that is true for both OpenFlow and traditional switches.
Except traditional switches never have high-latency, unreliable links between their components, and the data formats follow the optimized design of ASICs and not someone's half-baked academic paper.
We are just fighting over some nano or possible microseconds here with no one showing that it actually matters.
Then why don't people just place Ethernet between a CPU and RAM? It's "nano or possibly microseconds", right?
Google uses for, or they wouldn't be doing it.
See above.
At my company we are using it too and it works very well for us. We are an ISP by the way.
If it works, then the way you use it, did not require anything complex to begin with, and you use it as yet another management protocol. You could have bought cheap level 3 switches before, and configure them to do exactly the same thing with command line, except with less buzzwords.