I did, you're still being silly. It's easy to run non-branching code on a branching processor, it's almost impossible to do the opposite.
True, although you end up keeping a large amount of the die powered for no benefit at all. Similarly, you can run sequential code on a GPU by just leaving most of the threads powered and doing nothing.
It's easy to run code with weak locality on a processor with strong locality, it's almost impossible to do the opposite.
Not true. This is one of the reasons why GPUs and DSPs are significantly faster. There are large categories of algorithms with predictable access patterns that can leave a CPU with a conventional cache hierarchy (even with prefetch instructions) completely data starved. To load a value into a conventional CPU, you have to hit two or three layers of cache miss, each of which then has to pull in a complete cache line (typically 64 bytes). Meanwhile, a DSP can be sending memory requests at word granularity to the DRAM. Even with a quarter of the memory bandwidth, it can often achieve more throughput than a commodity CPU.
They won't run them well, for example software rendering on CPUs is horribly slow but t's still orders of magnitude better than trying to use your GPU as a CPU.
Your GPU is also turing complete, so aside from the memory protection aspects (which actually are present on some modern GPUs), your argument applies in reverse too. You can run sequential code on a GPU: only use one thread. You can run code that branches a lot, you'll just take a load of pipeline flushes as a penalty. You can run code that exhibits locality of reference, you'll just end up fighting the memory controller. So does that mean that your GPU is a general-purpose processor? In both directions, your performance overhead for using the wrong processor is a couple of orders of magnitude.
This is why we at one time had Lisp Machines with specialized hardware optimized for running Lisp efficiently. Message based machines were tried for Smalltalk
The main reason that Lisp machines lost out was that they were stack based. Stack-based instruction sets don't (easily) expose any instruction-level parallelism, which means that you can't easily extend them to take advantage of pipelining. That wouldn't have been such a problem if Lisp had been parallel (a barrel-scheduled multithreaded stack-based CPU can be very simple to design, have very good instruction cache usage, and get good power / performance ratios), but Lisp machines ran an single-threaded environments. I don't know of any machines (other than the Mushroom project from Manchester) that were designed for Smalltalk - it originally ran with custom microcode on the Alto - but the most successful message-passing machine was the Transputer, optimised for Occam code. Erlang has a similar abstract model and it run in telecoms systems, but on CPUs that are very poorly optimised for it.
To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.
The court has to decide whether copyright protection for APIs would promote the progress of science and useful arts. The EFF and others are arguing that it would hamper progress of science.
But what sets general purpose processors apart is that they assume the worst and tries to make all code perform, no matter how ugly. They optimize for flexibility, with an emphasis on minimizing the worst cases
Read TFA. They optimise for a specific category of algorithm, that is branch heavy (although comparatively light in computed branches), has strong locality of reference, is either single-threaded or has shared-everything parallelism, and a few other constraints. That's not a general purpose processor, that's something optimised for a specific workload and, because they've been the cheapest way of buying processing power for a few decades, people put a lot of effort into trying to shoehorn algorithms to have those characteristics. As GPUs became cheaper per FLOPS, people tried to shoehorn algorithms to fit on processors that are optimised for code with almost no branches, little locality of reference, explicit parallelism and synchronisation, and highly predictable memory accesses. These are also not general purpose processors. They are two points on a design space and we're going to see a lot more as it becomes increasingly cheap to put rarely-used processors on die. If you can only power 5% of the chip at any time, then you can afford to have a load of different pipelines optimised for very different classes of algorithm on the same die, even if they have the same (or mostly the same) instruction set and some of them can run code intended for any of them (albeit slowly and inefficiently).
A modern standard ARMv7 instead of the odd ARMv6 would be greatly appreciated too.
Connecting up the performance counter interrupt line would, but ARMv7 wouldn't. Having to have different OS images for different models of RPi makes them a lot less interesting. If you want an ARMv7 board, then go and buy one - there are loads of them.
There's a lot of overlap between those constraints. Cheap doesn't just mean cheap to buy, it means cheap to replace. And that means that when you break one, if the exact model doesn't exist anymore then you need to be able to run everything that was working on the old one on a newer model. The advantage of the RPi over more powerful ARM boards is that it comes with that guarantee - the A+ will run everything (including the same OS image) as the A and B.
The hypothetical 700MHz vs 1GHz issue that the grandparent talks about isn't that much of a problem. More importantly, a new SoC would likely be dual (or quad or octo) core and would be ARMv7, not ARMv6. That's a big change. I expect that the RPi will skip ARMv7 entirely and that eventually there will be an ARMv8 model (possible ARMv8.1 / ARMv8.2), but the jump to 64-bit gives a good excuse for needing a new OS image.
Disclaimer: I work a couple of floors below several of the RPi Foundation, but the only thing that they've told me about their future plans is that they have some. Everything in this post is uninformed guesswork.
Anyone who has written a nontrivial library can tell you that the answer to the first is a definite yes. Designing a good API is hard and requires a lot of thought (designing a bad API is pretty trivial). The second question is more subtle. If a good API is hard to design, then the company that designs one does deserve some advantage. In general, they get this advantage by being the first mover in their market. If they had the added advantage that no one could create a compatible implementation, would that be a significant advantage for them and would it hurt the industry as a whole? I suspect that the answer to that is that it wouldn't be a massive advantage (there aren't very many cases of people producing identical implementations, and where there are it's often of mutual benefit because their customers like having a second source), but it would be a big disadvantage to the industry because it would mean that you'd always have lock-in for every piece of software.
Don't compare floating point numbers solely for equality.