AIX is full of "hacks" or "modifications" in the TCP/IP stack to greatly improve the performance on POWER architecture on MP systems. Have any of these made it into mainstream Linux? Are they even valid on Intel architecture?
For instance, when running a benchmark on an AIX POWER system, try increasing the load, and see if your results go up. It can happen, that you increase the load, the CPU utilization climbs, but you benchmark remains the same. Well, you might be hanging in spin locks. AIX supports instrumented locks, so you can check this with the lockstat command.
Another potential problem is that two many global variables are located in the same CPU cache line. So you can pad single variables, so that they are in separate cache lines. Or, even worse, you have one global variable that is being constantly updated by all processors, and is constantly causing cache invalidation on the memory bus. Then you need to do a hardware memory bus trace, with an HP logic analyzer that looks like something out of Hentai Porn. Then you need to write up a patent or something:
https://patents.justia.com/patent/6430659
So I'm just wondering if all this poop will be done for Linux on OpenPOWER . . .