Yep, this is totally right. The main thing keeping an algorithm from running well on GCN cores is being branch-heavy. While I haven't kept up-to-date on the terminology, only one instruction at a time can be executing across a set of inputs.
The following code is pretty understandable and quick on a CPU. But on a GPU, performance suffers.
if (unitOnFire){
flailAround();
} else {
doFightRoutine();
}
So, if you have 128 baddies, and 2 are on fire, first the GCN will evaluate the if (with no branch prediction, afaik), then execute flailAround() for the 2 that's on fire. Only then will it execute the 126 fightRoutines (simultaneously).
Interestingly, ternary expressions, if possible to use, can frequently be optimized so that they do not incur this large performance penalty*. Something like the following:
health_change = (unitOnFire) ? -100 : 0;
unitHealth += health_change;
So yeah, coding for SIMD stuff is tricky to optimize.
* Some compilers will also optimize simple/equivalent if statements so that they run quickly, but I have no idea if that's the case for the PS4 APU.