There are three main types of computing environment:
- Monolithic single process,
- Complex single process,
- Mixed processes
MSP: written in a low-level language (asm, c, c++), typically a very finely tuned process that may use CPU threads for parallelism in a very carefully managed way, probably implementing its own scheduling etc. Non-deterministic operations like OS/Kernel interactions are generally very, very carefully supervised, custom memory management all over the place, etc, this is the core focus of the system and every effort is made to strip the OS down to preserve cache and determinism,
CSP: possibly written in a less high-level language or one that uses a VM, or incorporates lots of disparate libraries, less to no focus on determinism, often heavy interaction with the os (file access, etc), non-organized thread organization (typically task-specific threads), cache/memory efficiency may be optimized for algorithms or routines but the overhead multitasking isn't a major factor,
Mixed: The system is expected to run a large number of processes/services and the process has no expectation of determinism, everything from assembly to python-implemented-in-bash.
The design of Windows means that it's hard to build an MSP on Windows but it's feasible now with some of the server versions. These are usually extreme cases like High Frequency Trading but also all kinds of realtime systems.
CSPs are your "performant" industrial server process, from game servers to web servers. They probably take huge amounts of RAM for granted, but you'd probably get fired if you logged in on one and started using CPU.
Mixed: everything from your mundane intranet server, mail machine, firewall, to desktop computer. There's a ton of stuff running and unless someone logs in and uses 100% cpu for a couple hours people probably wouldn't notice even high amounts of contention.
For all of these solutions we follow one model: Everything competes for time on the same CPU: Scheduler, Kernel, OS and Processes.
We've moved some tasks out to co-processors which have been reabsorbed into the CPU: MMU, FPU...
Now we have complex chips with multiple cores and ... thread assignment is done in code, in competition with the code-threads that should be running?
In the MSP case: The OS is essentially a forced hit you have to take on your processor availability: you know that every so often it's going to pop in and steal some cycles determining that ... you should carry on doing what you were doing, sorry for messing up your cache line.
In the other cases, especially when there are a lot of processes, you get a gradual degradation caused by the system taking longer to make decisions about what is fair, while it is, itself, obstructing work from being done.
We need the ability to have a Kernel-Core or a Scheduler-Core with custom instructions that can do things like tell memory to go zero a page for us rather than writing zeros to memory... That can get special state information about the CPU cores to make smart decisions about what to run, instructions only available on those cores.
Putting the kernel on its own core gives us a security barrier, and again allows us to dedicate instructions.
We're over due for this architecture. We're already trying to do this with containers and hypervisors, but CPU vendors are just like "*shrug* we'll sell you more of the same"...