They suck. That's why Linux didn't use them (int 0x80).
Wrong. Linux didn't use them because every Unix OS out there didn't use them because... You know, they weren't designed specifically for x86. And - contrary to *every* single f***ing int 0x80 implementation out there, x86 Linux uses/used to (before SYSENTER) by-register instead of by-stack parameter passing convention.
That's why Windows NT didn't use them (int 0x2e)
Windows NT didn't use them because it was designed as a highly portable micro-kernel. Initially they targeted other architectures besides x86, such as MIPS. There were other reasons, and AFAIK most of the *actual* protection mechanism from the kernel was developed by an outside consultant (read it on a book some years back about Windows 95, cannot vouch for accuracy).
That's why *BSD didn't use them (int 0x80).
BSD kernels weren't designed for x86. They were ported for x86. The ports were done using the most generic approach. And every x86 BSD kernel uses by stack convention, not by register.
That's why OSX doesn't use them (int 0x80).
OSX wasn't designed as a x86-only operating system, and also inherits from BSD and XNU. XNU is based on Mach, which is based on 4.2BSD, so again, a port, not a native x86 development.
Cache locality is horrible, the far pointer requires more bytes/instruction and segment registers suck- especially when running in protected mode
You're kidding me, right? A single call that triggers a processor mechanism that creates a destination stack frame, SECURELY copies X bytes of stack between levels and invokes a higher - or LOWER level function? Most 32 bit code is 4-byte aligned anyway and binaries built page-bounded, so the actual savings are bare to none. And call gates are *actually* faster than interrupts. And given the way that protected memory works, I'd expect to see way more cache penalties on the interrupt approach than with a call gate. And all this not considering the huge amount of executed code before dispatching the actual syscall, at least on the kernels I mentioned.
The "DOS-style" syscalls you're referring to are a software interrupt trap, (also called a trap-gate).
No, you're confusing software interrupts (such as int 21h, int 80h, etc) with parameter convention. DOS-style is to pass parameters to interrupts by register, eg. ah=25h, al=00, int 21h. Every other x86 unix implementation passes by stack, not by register. There is the "but its slower" argument that falls when you actually look at the 2.0 kernel implementation specifics.
Every OS worth mentioning used them prior to SYSENTER being introduced.
True. Most unixes did it because of ABI compatibility (the use of int 0x80 predates any semi-decent protection mechanism from Intel, probably by a decade). Also, most OS developers aren't really tied into building a better mouse trap; If you look at it, most OSs use more-or-less exactly the same design, because most of the guys building them all learned from the same book and the same source references (nothing wrong with that, and it is truely the work of masters), and the guys that didn't usually don't care about x86 at all. Some are little details (such as the call gate stuff), others are a bit more serious (as in the case of not using the cpu's segmentation mechanism in userland applications to provide complete separated read-execute and read-write selectors), but in the end is like having a Ferrari to drive to church :)