Let's look at the actual setup used in this benchmark: AMD A10 7800B
4 Steamroller CPU cores (2 modules):
2x128 bit FMAC per module = 2x4 Single precision FMAC = 8 FMAC per module
8 GCN compute units:
4x16 single precision FMAC per compute unit = 64 FMAC per CU
CPU: 3500MHz x 16 = 35GFlops
GPU: 750MHz x 512 = 384GFlops
So we get more that x10 the (single precision) throughput using the GPU.
But that ignores the fact that GPUs are designed to tolerate long average memory access times while CPUs aren't. If the access pattern of the data isn't optimal (easily cacheable) the CPU will be stalled most of the time, the GPU will not. The GPU also have other resources (texture samplers++) that can be used to increase performance IF the code can use them.
But (as I pointed out in the earlier post) it isn't likely that there would be such a huge difference if the CPU didn't run crappy code. Most likely the CPU uses double precision floats while the GPU uses single precision. IIRC the GPU in question runs double precision floats at 1/16 the throughput of single precision - which would make the CPU superior in raw number crunching.