Comment Re:Latency != Frequency (Score 2, Informative) 217
The extra bandwidth does indeed allow more in-flight memory accesses, but there are many problems involved with this.
First of all, there are implicit problems in the memory-level parallelism in applications. How many memory accesses are independent of each other? For example, code that manipulates hash tables or linked lists does not profit from additional bandwidth because the next memory access depends on the current one. Such code normally does things like:
LD R1, 0(R1)
do something on R1
LD R1, 0(R1)
etc
See the dependency on R1
Second, there are problems with the microarchitecture. Microprocessors contain two structures necessary to handle off-chip memory accesses: The Load/Store Queues (LSQ) and a special table to track these external accesses (called something like the miss address table, where miss refers to the corresponding cache miss). The LSQs track all in-flight loads and stores and they are used to check dependencies between loads and stores. The largest implementation I remember can track a total of 48 loads (is it p4?). The Miss Address Table contains references to all off-chip accesses. This table is usually smaller. Thus, even if all the memory accesses are independent and can be issued in parallel you cannot really take profit of all that bandwidth. Theoretically you can issue hundreds of memory accesses during the time an off-chip access is in progress. In reality you will end up with 20 or 30 (that does not include prefetches and alike).
sorry for not including references