Comment Re:Overpriced G5s (Score 1) 419
The benchmark that this machine is going to be rated against is LINPACK, which involve large matrix calculations and is heavily floating point dependent. An ideal computer would be able to achieve its peak theoretical FP performance in the extreme of an infintely sized matrix calculation under LINPACK, R_peak. This is different from SPEC FP which is a suite of different floating point intensive benchmarks which test various aspects of floating point performance for the CPU, compiler, and overall system.
AFAIK, the Opteron has 3 asymmetric FPUs where each FPU performs some subset of functionality, and more than one FPU component may need to act in concert to perform other FPU instructions. More to the point, AFAIK, the Opteron is capable of doing 1 floating point multiply and 1 floating point add per cycle.
The PPC 970 has 2 fully symmetric FPUs each of which can perform any FPU instruction available to the architecture. One of the key features of the POWER ISA family and its descendants is the single cycle FMA instruction which allows each FPU to perform 1 floating point multiply and 1 floating point add per cycle. As a result, for what LINPACK measures, the PPC 970 is capable of twice the theoretical throughput per clock cycle when compared to an Opteron. Since Opterons and PPC 970s run at comparable clock speeds, it's no surprise that it takes twice as many Opterons to equal the performace of a cluster of 970s.
SPEC FP is irrelevant in this case.
-Bruce.
AFAIK, the Opteron has 3 asymmetric FPUs where each FPU performs some subset of functionality, and more than one FPU component may need to act in concert to perform other FPU instructions. More to the point, AFAIK, the Opteron is capable of doing 1 floating point multiply and 1 floating point add per cycle.
The PPC 970 has 2 fully symmetric FPUs each of which can perform any FPU instruction available to the architecture. One of the key features of the POWER ISA family and its descendants is the single cycle FMA instruction which allows each FPU to perform 1 floating point multiply and 1 floating point add per cycle. As a result, for what LINPACK measures, the PPC 970 is capable of twice the theoretical throughput per clock cycle when compared to an Opteron. Since Opterons and PPC 970s run at comparable clock speeds, it's no surprise that it takes twice as many Opterons to equal the performace of a cluster of 970s.
SPEC FP is irrelevant in this case.
-Bruce.