Slashdot is powered by your submissions, so send in your scoop

 



Forgot your password?
typodupeerror
×

Comment Re:Down side (Score 1) 141

First, 'just producing an SoC' is not trivial. The interrupt controllers designed for ARM11 are not going to work well with a Cortex A9, for example. DMA units will have different interfaces. The amount of effort required to try to shoehorn older on-chip peripherals to work with a new core is pretty large. Imagine trying to fit a Pentium into a 486 motherboard - it was possible, but only with a custom Pentium chip designed for it (and only on motherboards designed for the P24T 'Pentium Overdrive'), but the result was a really slow Pentium. The coupling between parts on an SoC is far tighter than between components on an x86 motherboard.

Comment Re:Hyperbolic headlines strike again (Score 1) 181

I did, you're still being silly. It's easy to run non-branching code on a branching processor, it's almost impossible to do the opposite.

True, although you end up keeping a large amount of the die powered for no benefit at all. Similarly, you can run sequential code on a GPU by just leaving most of the threads powered and doing nothing.

It's easy to run code with weak locality on a processor with strong locality, it's almost impossible to do the opposite.

Not true. This is one of the reasons why GPUs and DSPs are significantly faster. There are large categories of algorithms with predictable access patterns that can leave a CPU with a conventional cache hierarchy (even with prefetch instructions) completely data starved. To load a value into a conventional CPU, you have to hit two or three layers of cache miss, each of which then has to pull in a complete cache line (typically 64 bytes). Meanwhile, a DSP can be sending memory requests at word granularity to the DRAM. Even with a quarter of the memory bandwidth, it can often achieve more throughput than a commodity CPU.

They won't run them well, for example software rendering on CPUs is horribly slow but t's still orders of magnitude better than trying to use your GPU as a CPU.

Your GPU is also turing complete, so aside from the memory protection aspects (which actually are present on some modern GPUs), your argument applies in reverse too. You can run sequential code on a GPU: only use one thread. You can run code that branches a lot, you'll just take a load of pipeline flushes as a penalty. You can run code that exhibits locality of reference, you'll just end up fighting the memory controller. So does that mean that your GPU is a general-purpose processor? In both directions, your performance overhead for using the wrong processor is a couple of orders of magnitude.

Comment Re:Will it have the same garbage CPU? (Score 1) 141

The A53 cores (the 64-bit equivalent to the A7) are already starting to get pretty cheap. I wouldn't be surprised if they become cheap enough for the RPi Master (or whatever they call it) some time next year. Especially as they can guarantee that whichever chip goes in the RPi is going to have a lot of third-party software spending on it. That is very valuable to SoC makers: knowing that anyone who wants to do some embedded project can already get well-tested and well-supported off-the-shelf toolchains and operating systems for their part.

Comment Re:Hyperbolic headlines strike again (Score 1) 181

This is why we at one time had Lisp Machines with specialized hardware optimized for running Lisp efficiently. Message based machines were tried for Smalltalk

The main reason that Lisp machines lost out was that they were stack based. Stack-based instruction sets don't (easily) expose any instruction-level parallelism, which means that you can't easily extend them to take advantage of pipelining. That wouldn't have been such a problem if Lisp had been parallel (a barrel-scheduled multithreaded stack-based CPU can be very simple to design, have very good instruction cache usage, and get good power / performance ratios), but Lisp machines ran an single-threaded environments. I don't know of any machines (other than the Mushroom project from Manchester) that were designed for Smalltalk - it originally ran with custom microcode on the Alto - but the most successful message-passing machine was the Transputer, optimised for Occam code. Erlang has a similar abstract model and it run in telecoms systems, but on CPUs that are very poorly optimised for it.

Comment Re:Down side (Score 1) 141

They are backwards compatible. The problem is not ARM - it's the rest of the SoC. If you can find exactly the same other components on an SoC with a newer ARM core, then it will be fine. The problem is that ARMv6 didn't standardise things like the interrupt controller (ARMv7 does, but only in a later revision) or the bootloader interface. You'll need a different kernel to support a newer SoC, because most of the other components will be different..

Comment Re:POSIX open, named by Stallman, predates SCO (Score 1) 260

Why can't that decide your thinking or, more specifically, the court's thinking? The court has to decide whether copyright protection should cover APIs. The justification for this is right there in the constitution:

To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.

The court has to decide whether copyright protection for APIs would promote the progress of science and useful arts. The EFF and others are arguing that it would hamper progress of science.

Comment Re:Hyperbolic headlines strike again (Score 1) 181

But what sets general purpose processors apart is that they assume the worst and tries to make all code perform, no matter how ugly. They optimize for flexibility, with an emphasis on minimizing the worst cases

Read TFA. They optimise for a specific category of algorithm, that is branch heavy (although comparatively light in computed branches), has strong locality of reference, is either single-threaded or has shared-everything parallelism, and a few other constraints. That's not a general purpose processor, that's something optimised for a specific workload and, because they've been the cheapest way of buying processing power for a few decades, people put a lot of effort into trying to shoehorn algorithms to have those characteristics. As GPUs became cheaper per FLOPS, people tried to shoehorn algorithms to fit on processors that are optimised for code with almost no branches, little locality of reference, explicit parallelism and synchronisation, and highly predictable memory accesses. These are also not general purpose processors. They are two points on a design space and we're going to see a lot more as it becomes increasingly cheap to put rarely-used processors on die. If you can only power 5% of the chip at any time, then you can afford to have a load of different pipelines optimised for very different classes of algorithm on the same die, even if they have the same (or mostly the same) instruction set and some of them can run code intended for any of them (albeit slowly and inefficiently).

Comment Re:Who is that? (Score 2) 112

And it's worth noting that Amazon's low profits are largely due to the fact that they heavily invest their revenue in entering new markets (cloud hosting, music downloads, eBook readers, video streaming, tablets, and so on). Most of these are long-term investments. A company with a lot of diverse product lines can handle changes in the market much more easily than one that is just an online book retailer. If Amazon just sold books online, they'd probably have much higher profits relative to their size, but they'd be in a very fragile position.

Comment Re:Down side (Score 1) 141

A modern standard ARMv7 instead of the odd ARMv6 would be greatly appreciated too.

Connecting up the performance counter interrupt line would, but ARMv7 wouldn't. Having to have different OS images for different models of RPi makes them a lot less interesting. If you want an ARMv7 board, then go and buy one - there are loads of them.

Comment Re:Will it have the same garbage CPU? (Score 4, Insightful) 141

There's a lot of overlap between those constraints. Cheap doesn't just mean cheap to buy, it means cheap to replace. And that means that when you break one, if the exact model doesn't exist anymore then you need to be able to run everything that was working on the old one on a newer model. The advantage of the RPi over more powerful ARM boards is that it comes with that guarantee - the A+ will run everything (including the same OS image) as the A and B.

The hypothetical 700MHz vs 1GHz issue that the grandparent talks about isn't that much of a problem. More importantly, a new SoC would likely be dual (or quad or octo) core and would be ARMv7, not ARMv6. That's a big change. I expect that the RPi will skip ARMv7 entirely and that eventually there will be an ARMv8 model (possible ARMv8.1 / ARMv8.2), but the jump to 64-bit gives a good excuse for needing a new OS image.

Disclaimer: I work a couple of floors below several of the RPi Foundation, but the only thing that they've told me about their future plans is that they have some. Everything in this post is uninformed guesswork.

Comment Re:I mean, aren't (Score 1) 260

There are two questions and it's important not to conflate them:
  • Are APIs creative works?
  • Is the industry better served by protecting them or not?

Anyone who has written a nontrivial library can tell you that the answer to the first is a definite yes. Designing a good API is hard and requires a lot of thought (designing a bad API is pretty trivial). The second question is more subtle. If a good API is hard to design, then the company that designs one does deserve some advantage. In general, they get this advantage by being the first mover in their market. If they had the added advantage that no one could create a compatible implementation, would that be a significant advantage for them and would it hurt the industry as a whole? I suspect that the answer to that is that it wouldn't be a massive advantage (there aren't very many cases of people producing identical implementations, and where there are it's often of mutual benefit because their customers like having a second source), but it would be a big disadvantage to the industry because it would mean that you'd always have lock-in for every piece of software.

Comment Re:POSIX open, named by Stallman, predates SCO (Score 1) 260

POSIX is a superset of the UNIX Release 7 APIs, which (if APIs could be copyrighted) would be owned by AT&T. And POSIX is not as open as you think it is - until quite recently you had to pay for a copy of the spec, you still have to pay for certification, and you can only use the associated trademarks if you do. Imagine if AT&T had been able to block the creation of POSIX by claiming copyright on those APIs. Imagine if The Open Group had been able to enforce the rule that you couldn't ship anything that implements any of the POSIX APIs unless you implemented the full set and paid them for certification.

Comment Re:Hyperbolic headlines strike again (Score 3, Informative) 181

I'm the author of TFA. There's a big difference between a general purpose processor and a general purpose computer. A lot of current research in computer architecture is focussed on the idea that you have a sharp divide between accelerators and general purpose CPUs. The point of the article is that different CPU microarchitectures are specialised for different workloads (one of the cited results was that in a big.LITTLE arrangement, the A7 core runs one of the SPEC benchmarks faster than the A15 because of its lower cache access time, for example) and that there are a lot of assumptions about the kind of code that the general purpose core will run. Many of these are true for C code, but a lot less true for code written in other languages. The communication patterns that mainstream multicore processors are optimised for are heavily tied to C, to the extent that if you have a language with a shared-nothing abstraction and message passing then the only way of implementing it is horrendously inefficient at the hardware level.

Comment Re:An interesting paragraph from the article (Score 1) 181

It would be interesting, but it's also a question of encoding density. Having a fixed number of architectural registers (and a much larger number of microarchitectural registers) is a technique that works reasonably well. Adding more architectural registers makes your operand size very large. You could imagine something like Dalvik bytecode, with 2^32 SSA registers and a CPU able to interpret it by either using internal registers or spilling to RAM, but you'd likely end up needing huge instruction caches and not getting much (if any) speedup.

Comment Re:The "paid Microsoft tax" bit, apparently (Score 2) 96

I don't know specifically, but for some GPUs there's a flag to disable strict IEEE floating point compliance. This can make things go a lot faster, at the expense of sometimes giving the wrong result. For graphics workloads, a few more floating point rounding errors are normally invisible to the human eye, but for scientific (GPGPU) computing they can be problematic.

Slashdot Top Deals

Don't compare floating point numbers solely for equality.

Working...