Let's start with basics. Message-passing is not master-slave because it can be instigated in any direction. If you look at PIC Express 2.1, you see a very clear design - nodes at the top are masters, nodes at the bottom are slaves, masters cannot talk to masters, slaves cannot talk with slaves, only devices with bus master support can be masters. Very simple, totally useless.
Ok, what specifically do I mean by message passing? I mean, very specifically, a non-blocking, asynchronous routable protocol that contains an operation and a data block as an operand (think: microkernels, MPI-3). If you're clever, the operand is self-describing (think: CDF) because that lets you have overloaded functions.
The CPU is a bit naff, really. I mean, at least some operations can be pushed into a Processor In Memory, you have a fancy coprocessor for maths that you're repeatedly (and expensively) calling to create the functions that exist as a limited subset in FFTW, BLAS and LAPack. Put all three, in optimized form, along with your basic maths operations into a larger piece of silicon. Voila, massive speed boost.
But now let's totally eliminate the barrier between graphics, sound and all other processors. Instead of limited communications channels and local memory, have distributed shared memory (DSM) and totally free communication between everything. Thus, memory can open a connection to the GPU, the GPU can talk to the disk, Ethernet cards can write direct to buffers rather than going via software (RDMA and OpenSockets concepts, just generalized).
You now have a totally open network, closer to Ethernet than PCI or HyperTransport in architecture, but closer to C++ or Java in protocol, since the data type determines the operation.
What room, in such a design, for a CPU? Everything can be outsourced.
Now, move onto Wafer Scale Integration. We can certainly build single wafers that can take this entire design. Memory and compute elements, instead of segregated, are mixed. Add some pipelining and you have an arrangement that could blow most computer designs out the water.
Extrapolate this further. Instead of large chunks of silicon talking to each other, since the protocol is entirely routable, get as close to individual compute elements as you can. Have the router elements take care of heat and congestion issues, rather than compilers. Since packet headers can contain whatever label information you want, you have a notion of processes with independent storage.
It doesn't (or shouldn't) take long to figure out that a true network, rather than a bus, architecture will let you move chunks
of the operating system (which is just a virtual machine, anyway) into the physical computer, eliminating the need for running an expensive bit of simulation.
And this is marketspeak? Marketspeak for what? Name me a market that wants to eliminate complexity and abandon planned obsolescence in favour of a schizophrenic cross between a parallel Turing machine, a vector computer and a Beowulf cluster.