[...]
More generally, we have a fundamental problem in the I/O area: UNIX. UNIX I/O has a very simple model, which is now used by Linux, DOS, and Windows. Everything is a byte stream, and byte streams are accessed by making read and write calls to the operating system. That was OK when I/O was slower. But it's a terrible way to do inter-machine communication in clusters today. The OS overhead swamps the data transfer. Then there's the interaction with CPU dispatching. Each I/O operation usually ends by unblocking some thread, so there's a pass through the scheduler at the receive end. This works on "vanilla hardware" (most existing computers), which is why it dominates.
This is true. Though you're underestimating "modern" os's. Though, think of it as defensive planning. Who knowed ~20+ years ago that we would have solid state disks? Who knowed we would have 10GB NICs? SATA?
But the foundamental design of IO streams works and is easily adapted on new devices. Add on that the simplicity of /dev and all the concept of input and output in UNIX. Think about it.
[...]
The supercomputer interconnect people have been struggling with this for years, but nothing general has emerged.
RDMA via Infiniband is about where that group has ended up. That's not something a typical large hosting cluster could use safely.
Add to that fibrechannel. And NUMA is an old and tried technology.
Most inter-machine operations are of two types - a subroutine call to another machine, or a queue operation. Those give you the basic synchronous and asynchronous operations. A reasonable design goal is to design hardware which can perform those two operations with little or no operating system intervention once the connection has been set up, with MMU-level safety at both ends. When CPU designers have put in elaborate hardware of comparable complexity, though, nobody uses it. 386 and later machines have hardware for rings of protection, call gates, segmented memory, hardware context switching, and other stuff nobody uses because it doesn't map to vanilla C programming. That has discouraged innovation in this area. A few hardware innovations, like MMX, caught on, but still are used only in a few inner loops.
At the cost of my mood points or whatever, now i call bullshit.
Rings protection? Used at least in linux.
Call gates? You mean Sysenter? Used at least in linux from ~2002 if im not wrong
Segmented memory? Hello 32bits? Is that what you mean? Correct me if im wrong, but i thought it was a thing of the past.
Hardware context switching? You mean VMX (AMD) or SVM (Intel) ? At least on Linux those instructions are used.
C is the limiting on this? Please.
MMX? SSE/2 etc?
gcc -mmmx -msse -msse2 -msse3 -mssse3 -msse4 -msse4.1 -msse4.2
(talking about gcc because that is what i know, though im sure other compilers cane use those instructions too)
It's not that this can't be done. It's that unless it's supported by both Intel and Microsoft, it will only be a niche technology.
yep right.