AFAIK, the Opteron has 3 asymmetric FPUs where each FPU performs some subset of functionality, and more than one FPU component may need to act in concert to perform other FPU instructions. More to the point, AFAIK, the Opteron is capable of doing 1 floating point multiply and 1 floating point add per cycle.
The PPC 970 has 2 fully symmetric FPUs each of which can perform any FPU instruction available to the architecture. One of the key features of the POWER ISA family and its descendants is the single cycle FMA instruction which allows each FPU to perform 1 floating point multiply and 1 floating point add per cycle. As a result, for what LINPACK measures, the PPC 970 is capable of twice the theoretical throughput per clock cycle when compared to an Opteron. Since Opterons and PPC 970s run at comparable clock speeds, it's no surprise that it takes twice as many Opterons to equal the performace of a cluster of 970s.
SPEC FP is irrelevant in this case.