Next Generation Stack Computing 347
mymanfryday writes "It seems that stack computers might be the next big thing. Expert Eric
Laforest talks
about stack computers and why they are better than register-based
computers. Apparently NASA uses stack computers in some of their probes. He
also claims that a kernel would only be a few kilobytes large! I wonder if
Windows will be supported on a stack computer in the future?"
.NET Compatibility (Score:5, Interesting)
Of course... that would still depend on a version of Windows for it to run on.
X86 FPU's finally losing their stackness (Score:5, Interesting)
Re:X86 FPU's finally losing their stackness (Score:2, Interesting)
As someone who has written several Forth compilers for the x86 I'd like to point out that the design of the stacks on the x86 is very inefficient. The main stack is just that: a stack tied to one particular register. The FP stack was just a joke; a dead weasel could have designed something better. Anyway, I do like using Forth even under the x86 model - it's nice to know that my entire programming system fits into the on-die cache!
No, they're not better (Score:4, Interesting)
But why do you need out-of-order execution? Well, misses to memory are very expensive these days - it can easily take from 200 to 400 cycles to service a load that misses all the way to main memory. This can have a significant effect on performance. What out-of-order execution does is to allow independent instructions that are younger than the load to execute in parallel with it. Quite often these parallely-executed instruction will generate other misses to main memory, overlapping their latencies. So - latency of loads that miss is still very high, but at the very least the processor is not idle while servicing them (for a good read see "MLP Yes! ILP no!" by Andy Glew)
Itanium and Sparc compensate for the fact that they don't do stuff out-of-order by putting sh*tloads of L2/3 cache on-chip. The cost of a miss is still very high, but it happens much less often. The manufacturing cost of a chip is also much higher.
Note that what NASA is sending into space is "old" tech. The reason - well, cosmic rays are much stronger in outer space, and the smaller the gate, the easier it is for them to flip its state.
P.S. I'm a computer architect.
Re: transputer wikipedia link (Score:2, Interesting)
Re:NASA (Score:3, Interesting)
Re:Who likes them? (Score:2, Interesting)
Probably mostly just an accident of history register machines went superscalar first and "won" (mostly, because maybe since stack machines were more efficient, the need for superscalarity didn't hit so early...),. But, in short: stack machines, with similar design overheads to register machines, can extract at least as much concurrency as register machines, maybe more.
MMIX uses a register stack (Score:3, Interesting)
Re:Stop Hurting My Eyes (Score:1, Interesting)
Re:Appropriate instruction set (Score:3, Interesting)
The 6809 was not only easy and fun to program, 6809 programs tended to benchmark out significantly faster than programs for comperable CPUs like the Z80, 6800 and 8080. If the industry ever decides to scrap the x86 mess -- which they won't -- going back to the 6809 for a starting point might not be a bad idea at all. I once did a plot of measured times for a benchmark where timings were available for a bunch of CPUs (Sieve of Eratosthenes). When you plotted out clockspeed vs word width, all the CPUs from the 8080 to the Cray something or other fell out into an untidy straight line, except for the 6809. There were, as I recall, three different results published for SOE on the 6809 and all three were an order of magnitude faster than they had any reasonable expectation of being based on the hardware's apparent capabilities.
Re:Stack - bad for speed, good for low power (Score:3, Interesting)
With a stack machine, running one instruction stream in parallel is very hard, while very easy on a register-based one. But the flip side of this is that on a stack machine running multiple instruction streams in parallel is incredibly easy while *Very* difficult on a register based CPU.
For instance, take "add 1 to each element of this 30-length array" and the optimization to unroll the loop by three:
The stack version can use parallel streams:
push array # "stack[2]"
push 30 # "stack[1]"
push 1 # "stack[0]" of stream #1
push 2 # "stack[0]" of stream #2
push 3 # "stack[0]" of stream #3
push 3 # number of parallel streams to run
fork
loop:
add 1 to mem at (stack[2] + stack[0])
stack[0] += 3
if stack[0] < stack[1] goto loop
join
You'll have to use your imagination to expand the loop body into what it would look like in stack-instructions, but basically the fork pops the number of parallel stacks to run and then the join waits for each parallel stack to complete. Of course in a real implementation you would also push a number of stack elements to copy, etc. Since instruction decoders for stack machines are so simple your cpu can have literally hundreds of them on a die and each one still doing useful work.
The register-based machine will unroll the loop:
set r1 to 30
set r2 to 0
set r3 to array
loop:
set r4 to r3[r2]
set r5 to r3[r2+1]
set r6 to r3[r2+2]
add 1, r4 store in r4
add 1, r5 store in r5
add 1, r6 store in r6
store r4 to r3[r2]
store r5 to r3[r2+1]
store r6 to r3[r2+2]
add 3 to r2
compare r2 to r1, jump to loop
Now try to run that in parallel and you get a couple memory fetches/write overlayed, but mostly it is pretty slow. Just one hiccup in the pipeline and all of the parallelism stops. Now to mention the code to catch the remainder of the loop if not an even multiple.
Joy! (Score:3, Interesting)
I've looked into it a couple times, and it seems pretty neat. In a word, functional concatenation.
Plus, as we all know, functional languages are so much more fun than procedural.
A Near Miss for Stack Computing Circa 1981 (Score:5, Interesting)
An excerpt from a bit longer essay I wrote [geocities.com]:
Re:Stop Hurting My Eyes (Score:2, Interesting)
As far as I know, there are no implemented second-gen stack computers that support that feature.
(There have been a few theoretical ones.)
Which ones are you talking about?
The X86 is an example of everything! (Score:3, Interesting)
The course went quickly downhill after this observation. No one could figure out how incorporating every processor architecture into one product was a good thing
If Stack-Based Computing Is So Great... (Score:3, Interesting)
Parallalism (Score:3, Interesting)
If a stack machine is that much simpler, couldn't you either have:
- A vast amount of cores for many unrelated threads
- Or: Multiple pipelines and explicit division of instructions into the pipelines?
The second refers to an instruction coding similar to VLIW such that you parallelise the code on multiple stacks but it still shares an instruction/data cache and allows for parallelism without heavy multi-threading at the high-level (and instead having parallelism as a compiler optimization at the low-level).Re:Twelfth of Never (Score:2, Interesting)
Re:Not a good idea (Score:3, Interesting)
This has been discussed ad nauseam in the computer architecture community, and I repeat: it's not a good idea!