The problem here is that small dynamically allocated objects with heavy pointer threading flush even large caches quickly and yeild very low locality combined with very low efficiency of cache line memory faults
CPU performance scales, even L1/L2 performance scales reasonably well, but raw memory perforance is two orders of magnitude slower than processors and quickly becoming three orders of magnitude slower. It's nearly impossible to constrain page level working sets of OO programs to scale performance.
When more than 20% of your memory hits fault to raw memory, performance just crawls, and a faster machine, with lots more raw memory doesn't help,
Direct mapped caches, and small set associate caches do not work well with today's OO programming
It's time to realize that for the last 5-10 years memory performace hasn't hardly (in comparison) improved at all
What good is a 4GHz processor, that runs at 100mHz memory speeds when it's all said and done because of poor cache line usage for every word or two referenced.