I design realtime simulation kernels, and use a combination. One simple design a few years back: Separate processes (signals) perform drastically different tasks - sequence control, unique device or network I/O, etc. Separate processes were used because each task's execution profile is subtly different: network I/O is all about filling/draining or assigning buffers; device tasks are about waiting on physical devices that will get back to you just whenever the hell they want; sequence control is right now, don't wait, gotta stay on top of it or it all comes crashing down. Within a process, such as sequencing, a single, unique method of communication is used; In sequencing, I used semaphores/condition variables, etc, the Posix mantra we all know and love. They're usually the fastest, because I used separate threads for posting the zoo of myriad programs and functions in the simulation (the "payload" from the view of the sim kernel). Between the various critters in the zoo, shared memory and the various locking mechanisms preferred by their programmers.
Each of these mechanisms is different, and the differences are just as important as the similarities.