This is my vote. It was the first GUI I saw in person, on my C64.
There are a few big mistakes about Bulldozer here.
The FP is completely shared between the integer clusters. The FP is 4-wide and the two clusters compete for all the resources in the FP.
Each Bulldozer integer cluster is 4-wide. The shared instruction fetch is also 4-wide.
Sandy Bridge has 168 instructions in flight and Bulldozer has 128 per cluster. Sandy Bridge has a combined FP/INT scheduler with 54 entries. Bulldozer has separate schedulers with 40 INT per cluster and 60 FP entries.
You are correct about BDs Achilles heal. The L2 and L3 latencies are longer than SB. I think the solution is to reduce the latencies, not increase the in flight window size.
This one really inspired me in High School. Is fractint still around?
I didn't read GEB until grad school, but I think I could have appreciated it in High School.
Those two books can really change how you look at the world.