We do need something to make multiple-CPU programming easier though. Threaded programming in C/C++ or similar can turn into a nightmare real quick, it's error prone and complicated.
If you want to use C++, I suggest Thread Building Blocks, which is an open-source C++ library. It is a set of reasonable primitives, including a task scheduler and some simple parallel iterators that create tasks. The task scheduling makes it mostly independent of the specific number of cores in the system, which is key. It think it is part of most Linux distributions these days. For simple data parallel computations, you can avoid thinking about threads and locks entirely, but yet it also allows provides the low-level primitives to write sophisticated highly optimized code, too.
P.S. I totally agree that Erlang is just *not* the right solution for multicore. Erlang's message passing is great for the application for which it was designed (telecommunications equipment with multiple independent line cards and such) or any such highly concurrent applications with high availability needs. It just isn't well suited for multicore programming (which just has an entirely different set of challenges such as data locality).
No, the above post really overstates what goes on inside today's x86 chips.
It is true that Intel and AMD internally break up x86 into simpler "micro-ops" to simplify the internals of the chip. However, the specific micro-ops uses are tailored explicitly for x86 instructions, and many match up with x86 instructions one-to-one. The mapping really isn't that programmable, either. Most of the mapping is hard-coded and highly optimized. It would not be trivial to support another ISA such as PowerPC, even for just user-mode instructions. If you then consider all the privileged instructions, virtual memory, and virtualization stuff, you have a real mess. It would likely be easier to start from scratch rather than try to retrofit a current x86 to be anything other than an x86. Sure, you could reuse some of the arithmetic units and memory controllers perhaps, but the core would have to change pretty dramatically.
That said, Transmeta (RIP) did have technology that would likely make it easier to run non-x86 code on its processor, and the translation was done in software. But even its internal instructions were likely closely match to specifics of the x86 ISA.
May the bluebird of happiness twiddle your bits.