That's less of an issue if your throughput comes from thread-level parallelism. There are some experimental architectures floating around that get very good i-cache usage and solid performance from a stack-based ISA and a massive number of hardware threads.
The other day, I had the mildly insane idea that perhaps our abilities to explore the architectural space are limited by all existing architectures having been painstakingly handcrafted. Thus, if it were possible somehow to parametrically generate an architecture, and then synthesize a code generator backend for a compiler and a suitable hardware implementation, we might be able to discover some hidden gems in the largely unexplored universe of machine architectures. But it sounds like a pipe dream to me...
Any force sufficient to tear two quarks apart is sufficient to generate new quarks which then bind with the "free" quarks.
Sounds like the War on Terrorism in a microscopic edition. Fractal universe confirmed!
In either case, it sounds like a hell of a challenge, as (if I understand correctly) you'd presumably need to pre-evaluate logic flow and track how resources are accessed in order to embed the proper data cache hints. However, those sort of access patterns can change depending on the state of program data, even within the same code. Or, I suppose you could "tune" the program for optimal execution through multiple pass evaluation runs, embedding data access hints in a second pass after seeing the results of a "profiling" run.
Sounds like LuaJIT on steroids.
and stack machines are notorious for having HORRIBLE support for languages like C
Which is what makes them so awesome. It's like a door that filters out undesirable drunken retards before they even enter your house.
From what I can tell, his design looks like it might be flops per watt comparable to GPUs, but with different memory abstractions that result in similar limitations. I suspect that if you write custom code for it, and have the right kind of problem, it will do significantly better than available options, but in the general case and/or non specialized code it won't do anything much better than a GPU, but it it may be competitive.
In other words, very much like Chuck Moore's Forth cores.
Hackers of the world, unite!