Disclaimer: I used to teach Cell programming classes for people who were looking to do HPC on the blades.
Cell failed. But the reasons behind the failure are more interesting.
The obvious answer is that it was hard to program. On a single chip you had the PowerPC processor and 8 SPUs. Communication was through mailboxes for small messages and DMA transfers for larger messages. To get the most out of a chip you had to juggle all 9 processor elements at the same time, try to vectorize all of your ops, and keep the memory moving while you were doing computation. That is the recipe for success for most architectures - keeping everything as utilized as possible. But it is also hard to do on most architectures, and the embedded nature of Cell made it that much more difficult.
There were better software tools in the works for people who didn't want to drop down to the SPU intrinsic level to program. There were better chips in the works too; more SPUs, stronger PowerPC cores, and better communications with main memory. Those things did not come to fruition because IBM was looking to cut expenses to keep profits high (instead of boosting revenue). The Cell project was killed when a new VP known for cost cutting came in. We finally had a good Cell blade to sell (QS22 - two chips, 32GB RAM, fully pipelined double precision, etc.) and that lasted four months before the project got whacked. And we lost a lot of good people as a result. (That VP, Bob Moffat, was part of the Galleon insider trading scandal.)
So yes, Cell failed. But not necessarily for the obvious reasons. IBM has been on a great cost cutting binge the past few years - it lets them meet their earnings per share targets. But it causes collateral damage.