Larrabee is expected to at least be competitive with nVidia/AMD's stuff, although it might not be until the second generation product before they're on equal footing.
Competitiveness is not a quality of generation number. Still: What statistics have you seen that compare Larrabee and something people use right now (ATI/nVidia)? There is this presentation (PDF) they made at SIGGRAPH, which shows that performance increases as you add more Larrabee cores. Here's a graph which may mean something. The y-axis is "scaled performance" What might that mean?
Graphs show how many 1 GHz Larrabee cores are required to maintain 60 FPS at 1600x1200 resolution in several popular games. Roughly 25 cores are required for Gears of War with no antialiasing, 25 cores for F.E.A.R with 4x antialiasing, and 10 cores for Half-Life 2: Episode 2 with 4x antialiasing.
Sounds neat. I guess that's why they're going to promote the 32-core Larrabee. How much will something to run these cost and how much power will it consume? They're still developing this thing, so why do I keep hearing that it will BLOW MY MIND? I have no doubt that Intel has an army of capable engineers that could build something to render graphics great, but if it costs more than the consumer can possibly pay, there's no real point. Intel is gunning for 2 TFLOPs. I'm pretty sure the Radeon HD 4870 passes that mark already (and you can purchase it for less than $500). Sure, it's a cool technology, but I'd like to see some more facts and figures.
What have I heard? Power usage/heat: 300W TDP. That's pretty horrific. Cost: 12-layer PCB. That's twice the typical graphics card and four more than the high-end Radeon and nForce cards. That doesn't directly translate into cost, but generally more complicated equals more expensive.
But back to the PS4 -- Sony's real mistake with the PS3 was expecting the Cell processor to be the most incredible computing device ever. Original plans for the PS3 included 2 Cell processors, but they changed to the RSX when they realized the Cell wasn't capable of rendering graphics like they wanted to (whereas the XBox 360's architecture was designed with the GPU and CPU co-existing from the start). You can't build a bunch of fast parts and stick them together, you have to build a fast system. Perhaps Sony has learned their lesson.
Not to throw down, but do we even have a definition for "perfect" software? Because I could quite easily argue that not only does the definition not exist, but it is impossible to create a consistent definition for all potential users. The field of MP3 players is a good example: The iPod dominates the market for some strange reason, but many people view other MP3 players as "worse." More buttons would break the "perfect simplicity" of the system, but the inability to arbitrarily build and shuffle playlists is quite obnoxious.
Internet browsers are another interesting domain. My company uses a web-based MIS designed with Internet Explorer 6 in mind. One could blame the system for poor design for reliance on proprietary ActiveX controls, but we're stuck with it. That said, IE6 is the "best" browser can offer to our employees, because Firefox, Opera and Chrome will not function. We're not even touching "perfect" yet.
But I haven't really address the question of the cost of developing "perfect" software, so let's make some assumptions. Let's view "perfect" from a security standpoint (accessibility, confidentiality and integrity) and use Common Criteria as the metric of software goodness.
The first EAL of CC are pretty easy to come by, since the basic definition is that the software works are doesn't crash. The next levels require discretionary (optional) protection, followed by mandatory object protection. That's nothing too interesting, as software of any importance should be doing this stuff anyway. Top-tier perfect software development involves formal verification of the software - as in, developing a description and proving that it will always work. More work is involved proving that your software actually matches the mathematical description of your software, which costs money. If you've ever tried to formally verify software, you will quickly realize that it takes AT LEAST the same amount of time to verify the software as it did to initially develop the software. There are shortcuts such as Automated Theorem Proving, but it's an NP-complete problem and you're going to need to pay (at least) a person who understands it.
So what if we don't want to formally verify our software, but at least check it until it's "good enough." You could hire some hackers to independently test your software, but that costs money. You could internally check it, but that takes time (read: money).
What we really need is a testing facility that integrates well into and speeds up development, which is pretty much what unit testing is for. I can say a lot of good things about unit testing, but they're certainly far from perfect. They take time to develop, but if they help catch glitches, then you're actually speeding along development. Honestly, I couldn't tell you if the net result is time savings, but I can tell you that you can only catch the errors that you are looking for.
In conclusion: I disagree with your statement that developing "zero defect" software costs the same as developing and shipping software with defects. Formal verification is nice, but completely unreasonable to ask for. Independent testing will always cost more money and internal unit testing lacks the independent thought that really finds errors. I would love for software to be perfect, but it is simply too much to ask for out of developers today.
since CUDA is roughly C
Not quite. CUDA looks a lot like C in that it has C-family syntax but the biggest limitation it has is that there is no application stack - which means no recursion. CUDA also lacks the idea of a pointer, although you can bypass this by doing number to address translation (as in, the number 78 means look up tex2D(tex, 0.7, 0.8)). The GPU also has other shortcomings, in that most architectures like to have all their shaders running the same instruction at the same time. For this code
if (pixel.r < pixel.g){
}else if (pixel.g < pixel.b){
}else{
}
The GPU will slow down a ton if the pixel color causes different pixels to branch in different directions. Basically, the three sets of shaders following different branches of that code will be inactive 2/3 of the time.
In the Cell, you really do just program in C with a number of extensions added onto it like the SPE SIMD intrinsics and the DMA transfer commands (check it out). The Cell really is 9 (10 logical) processors all working together in a single chip (except in PS3, where there are only 7 working SPEs). Furthermore, your 8 SPEs can be running completely different programs -- they're just little processors. Granted, you have to be smart when you program them to deal with race conditions and all the other crap you have to deal with for multithreaded programming. The Cell takes about 14 times longer to calculate a double precision floating point than a single (and there aren't SPE commands to do four at once like you can with singles).
So which is more powerful? It really depends what you're doing. If your task is ridiculously parallellizable and doesn't require the use of recursion, pointers or multiple branches, the GPU is most likely your best bet. If your program falls into any of those categories, use a Cell.
"Procedural" refers to the fact that content is being generated on-the-fly, rather than stored in giant texture files. It is derived from the procedural scripts used to define the parameters for the object to be created (although with the power of scripting languages today, one could argue that this no longer applies).
One of the coolest procedurally-generated demonstrations I have seen is
Personally, I'm working on a procedural map generation algorithm for a real-time strategy game.
Numeric stability is probably not all that important when you're guessing.