Sticking with IBM for Cell would have made very little sense. The Cell processor is very similar to how NVidia's CUDA presents the graphics card to you: limited cache (shared memory), lots of very simple hardware threads, almost no branch predication, etc. So, both CUDA and Cell would crank out great numbers on things like a particle simulator, MPM, image processing, and the like, but are not equipped to do some useful things like running a scheduler, or a word processor. Basically anything that's very difficult to multi-thread would be very hard/impossible to adapt to a Cell like architecture.
And some of the applications that are would be useful in Cell most likely work in CUDA. So, instead of having to have a regular processor + Cell + CUDA, why not just have a regular processor + CUDA?