Catch up on stories from the past week (and beyond) at the Slashdot story archive

 



Forgot your password?
typodupeerror

Comment Re:Some notes on branch prediction vs conditional (Score 1) 161

However, if you are aiming for high-end systems conditional move may not be that big of a deal. See for example the following analysis from Linus Torvalds regarding cmov

The problem with Torvalds' analysis (which is otherwise pretty good and worth reading) is that it only looks at local effects. The problem with branches is not that they're individually expensive, it's that each one makes all of them slightly more expensive. A toy branch predictor is basically a record of what happened at each branch, to hint what to do next time. Modern predictors use a variety of different strategies (normally in parallel) with local state stored in something like a hash table and global state shared with all branch locations. If the branch that you've added happens to hit the same table entry as another, then it may cause mispredictions elsewhere. This is horrible to try to model, because you have a performance cliff that moves depending on code layout (which is part of the reason why randomising the order of functions in a program can have around a 20% impact on performance).

The probability of misses increases based on branch density. Short branches have another issue, which is that typical superscalar processors don't make per-instruction predictions, they make predictions per fetch granule. If you have an 4-way superscalar processor, then you get one prediction for each 4 instructions. If you have two branches in there, then they'll be predicted together. This means that you really don't want a short branch immediately followed by another branch (or following a not-taken branch) unless you're really careful about alignment.

Note that some processors have spectacularly bad implementations of either branch prediction or conditional moves. PowerPC is notorious for this, where the performance difference between the two representations varies hugely between different iterations by the same vendor (and more between vendors), so compiler tuning is almost impossible.

If the ARM instruction set was designed today however it is likely that the designers would not go crazy with conditional execution since the bits could be better used for something else

You can see this with ARMv8. Most of the predication is gone, there are basically just conditional moves (conditional select) and conditional branches. There was apparently a lot of argument about whether to allow conditional loads. These are very useful because most other conditional patterns can be reduced to a conditional move: calculate both sides and select the one that you wanted, but is the condition is 'pointer is not null' then you can't speculatively load and then decide not to take the trap. Another proposed alternative is to have a non-trapping load (i.e. one that returns zero if that load would trap), though this can be difficult in the presence of swapping (the processor doesn't know the difference between a not-present page and a not-present-now-but-can-be-swapped-in page).

Comment Re:Let me know when Android gets 41MP Pureview. (Score 1) 135

My partner just replaced her Nokia 1020. It was quite depressing how much better the 1020 was than the replacement in almost all regards. She'd probably have kept it if not for the fact that it didn't speak a modern version of TLS and so was breaking with an increasing number of web sites. It's a shame no one managed a decent Android port to the 1020 - I'd love to put LineageOS on it.

Comment Re:They did it to themselves (Score 2) 135

Microsoft is in an unfortunate position because the thing that they don't like about their platform is the thing everyone else does. A load of the old Win32 APIs are horrible designs for security and make it very difficult to impose sensible sandboxing policies post-hoc. If you want to make a secure Windows system, then the best thing to do is throw a load of that away and move to a more modern set of APIs that are designed with security in mind from the start. There's only one problem with this: most people who run Windows do so because they like their Win32 apps and want to keep using them.

Comment Re:A burn Nokia (Score 1) 135

Nokia killed themselves long before Microsoft got involved. Going with Windows Phone was a long shot that might have worked. Keeping their existing strategy of 6 independent teams all trying to sabotage each other and ignoring the fact that there was actual competition in the smartphone market was an even worse alternative.

Nokia has a well-designed kernel (Symbian EKA2) with a horribly dated set of userspace APIs designed for a time when 2MB of RAM was a lot. Their solution? Replace the kernel with Linux and then fight internally about what the userspace would look like, so each line of phones needed apps specially written for it and had a shelf life of about 2 years before it was replaced with an entirely incompatible set of APIs that required large parts of the apps to be rewritten.

Outsourcing OS design to Microsoft was no worse than this, and if Microsoft had managed to persuade anyone to write Windows Phone apps might have worked. There were a lot more third-party apps for Windows Phone than for the last couple of attempts for Nokia to build their own ecosystem after a decade of pissing off everyone who used to like their stuff.

Comment Re:Great, so you fucked up our UI for nothing? (Score 2) 135

I'd disagree with step 1. Windows Phone had a pretty good UI (though using the same UI on laptop / desktop devices without touchscreens was horrible, and using a Frankenstein mix of that and the traditional Windows UI was even worse). The problem with Windows Mobile was always the lack of third-party apps, not the core functionality.

Comment Re:What about Universal Windows Platform (UWP) app (Score 1) 135

Microsoft has been trying to make it easy to use Visual Studio to write iOS and Android apps. I can see that one possibility would be make it easy to use UWP to write apps that worked well on iOS, Android, and Windows, then come back to mobile devices when there are enough apps that use it and can become mobile Windows apps with a simple recompile.

Comment Re:For those of us that don't know (Score 4, Informative) 161

Exactly how do you expect conditional moves to be executed at the renaming stage?

The conventional way is to enqueue the operation just as you do any other operation that has not-yet-ready dependencies. When the condition is known, the rename logic collapses the two candidate rename registers into a single one and forwards this to the pipeline. Variations of this technique are used in most mainstream superscalar cores. The rename engine is already one of the most complex bits of logic in your CPU, supporting conditional moves adds very little extra complexity and gives a huge boost to code density.

This is a disadvantage if one expect that all processors are the same and expect the code optimized for one ISA (and likely microarchitecture) should run well on other ISAs. Really bad.

If you come along with a new ISA and say 'oh, so you've spent the last 30 years working out how to optimise this category of languages? That's nice, but those techniques won't work with our ISA' then you'd better have a compelling alternative.

That isn't the only way to solve that problem, in fact that sounds like a very bad design.

It is on RISC-V. For the J extension, we'll probably mandate coherent i-caches, because that's the only sane way of solving this problem. Lazy updates or indirection don't help this, unless you want to add a conditional i-cache flush on every branch, and even that would break on-stack replacement (deoptimisation), where is not always a branch in the old code, but there is in the new code, and it is essential for correctness that you run the new code and not the old.

MIPS was killed?

Yes. It's still hanging on a bit at the low end, mostly in routers, where some vendors have ancient licenses and don't care that power and performance both suck in comparison to newer cores. It's dead at the high end - Cavium was the last vendor doing decent new designs and they've moved entirely to ARMv8. ImagTec tried to get people interested in MIPSr6, but the only thing that MIPS had going for it was the ability to run legacy MIPS code, and MIPSr6 wasn't backwards compatible.

Custom instruction support is a requirement for a subset of the market and it doesn't cause any problem

Really? ARM seems to be doing very well without it. And ARM partners seem to do very well being able to put their own specialised cores in SoCs, but have a common ARM ISA driving them. ARM was just bought by Softbank for $32bn, meanwhile, all of the surviving bits of MIPS were just sold off by a dying ImagTec for $65m. Which strategy do you think worked better?

Can't run the code from a microcontroller interfacing a custom LIDAR on the desktop computer? Who the fuck cares? Really?

How much does it cost you to validate the toolchain for that custom LIDAR? If it's the same toolchain that every other vendor's chip uses, not much. If it's a custom one that handles your special magic instructions, that cost goes up. And now your vendor can't upstream the changes to the compiler, because they break other implementations (as happened with almost all of the MIPS vendor GCC forks), so how much is it going to cost you if you have to back-port the library that you want to use in your LIDAR system from C++20 or C21 to the dialect supported by your vendor's compiler? All of these are the reasons that people abandoned MIPS.

Comment Re:Should have added this on your SN post :) (Score 3, Interesting) 161

How do the J2/3/4 open source SuperH designs compare?

I've not looked at SuperH in detail, so I can't really compare.

I seem to remember there were other pitfalls to their architecture, but getting a processor that is Management Engine (Aka Clipper+Palladium+TPM) free is a huge boon to the future of computer security

I disagree. A TPM, secure enclave, or equivalent, is increasingly vital for computer security. It is absolutely essential that you have some write-only storage for encryption keys into a coprocessor that will perform signing / signature verification / encryption / decryption, but which does not allow the keys to be exfiltrated. Anything less than this and a single OS-level compromise means that you need to reset every password and revoke every key that you've used on that machine.

Having said all this: Is it perhaps time for a different CPU project, or a fork of RISC-V with these missing features added, at the risk of binary incompatibility, but to the benefit of performance and perhaps security?

There are lots of extensions to RISC-V, but the problem there is fragmentation. You need the A extension if you want to run a real OS. You probably need the C extension, because compilers are starting to default to using it. The M extension is useful, so people will probably start using it soon. Hardware floating point is expensive on low-end parts, so you're going to end up with some having F, some having D, and some having neither (this was a pain for OS support for ARM until recently - now ARM basically mandates floating point on anything that is likely to run an OS), and a few will support Q. L is unlikely to be used outside of COBOL and Java, so isn't too much of an issue (one is niche, the other is typically JIT'd so it doesn't matter too much if only some targets support it). And that's before there's any widely deployed silicon. Expect vendors to add their own special RISC-V instructions, making their own versions of toolchains and operating systems incompatible.

RISC-V isn't the first project to try this. OpenRISC has been around for a lot longer, but RISC-V managed to get a lot more momentum. I don't think that a competing project would find it easy to get any of this. It remains to be seen whether this momentum can translate to a viable ecosystem.

Comment Re:Can this CPU be implemented on FPGA? (Score 1) 161

Not sure about this one, but the RISC-V Rocket cores can run in various FPGAs (as can the Sodor cores, which are more useful if you want to learn about computer architecture, but are a bit out of date in terms of conformance to the RISC-V spec). The lowRISC SoC includes the Rocket core and can also run in FPGA. The FreeBSD RISC-V bringup was done in a mixture of software emulator and lowRISC in FPGA.

Comment Re:The bottom line (Score 2) 161

No they can't and the DARPA SSITH program (yes, DARPA sometimes names projects with Star Wars references) is explicitly intended to try to address this problem. At present, unless you not only run your own fab, but also build your own equipment and don't license things from the likes of Cadence, you have no guarantees that the thing that you get back doesn't have secret vulnerabilities introduced. Trying to verify that the chip you get back corresponds to the RTL that you sent to the fab is a very hard problem.

Comment Re:For those of us that don't know (Score 5, Informative) 161

Less instruction sets makes assemblers and compilers easier to implement

I'll give you assemblers (though assemblers are so trivial that there's little benefit from this), but not compilers. A big motivation for the original RISC revolution was that compilers were only using a tiny fraction of the microcoded instructions added to CISC chips and you could make the hardware a lot faster by throwing away all of the decoder logic required to support them. Compilers can always restrict themselves to a Turing-complete subset of any ISA.

RISC-V is very simple, but that's not always a good thing. For example, most modern architectures have a way of checking the carry flag for integer addition, which is important for things like constant-time crypto (or anything that uses big integer arithmetic) and also for automatic boxing for dynamic languages. RISC-V doesn't, which makes these operations a lot harder to implement. On x86 or ARM, you have direct access to the carry bit as a condition code.

Similarly, RISC-V lacks a conditional move / select instruction. Krste and I have had some very long arguments about this. Two years ago, I had a student add a conditional move to RISC-V and demonstrate that, for an in-order pipeline, you get around a 20% speedup from an area overhead of under 1%. You can get the same speedup by (roughly) quadrupling the amount of branch predictor state. Krste's objection to conditional move comes from the Alpha, where the conditional move was the only instruction requiring three read ports on the register file. On in-order systems, this is very cheap. On superscalar out-of-order implementations, you effectively get it for free from your register rename engine (executing a conditional move is a register rename operation). On in-order superscalar designs without register renaming, it's a bit painful, but that's a weird space (no ARM chips are in this window anymore, for example). Krste's counter argument is that you can do micro-op fusion on the high-end parts to spot the conditional-branch-move sequence, but that complicates decoder logic (ARM avoids micro-op fusion because of the power cost).

Most of the other instructions in modern ISAs are there for a reason. For example, ARMv7 and ARMv8 have a rich set of bitfield insert and extract instructions. These are rarely used, but they are used in a few critical paths that have a big impact on overall performance. The scaled addressing modes on RISC-V initially look like a good way of saving opcode space, but unfortunately they preclude a common optimisation in dynamic languages, where you use the low bit to differentiate pointers from integers. If you set the low bit in valid pointers, then you can fold the -1 into your conventional loads. For example, if you want to load the field at offset 8 in an object, you do a load with an immediate offset 7. In RISC-V, a 32-bit load must have an immediate that's a multiple of 4, so this is not possible and you end up requiring an extra arithmetic instruction (and, often, an extra register) for each object / method pair.

At a higher level, the lack of instruction cache coherency between cores makes JITs very inefficient on multicore RISC-V. Every time you generate code, you must do a system call, the OS must send an IPI to every core, and then run the i-cache invalidate instruction. All other modern instruction sets require this to be piggybacked on the normal cache coherency logic (where it's a few orders of magnitude cheaper). SPARC was the last holdout, but Java running far faster on x86 than SPARC put pressure on them to change.

Licensing also matters a lot

This is true, but not in the way that you think. Companies don't pay an ARM license because they like giving ARM money, they pay an ARM license because it buys them entry into the ARM ecosystem. Apple spends a lot of money developing ARM compilers, but they spend a lot less money developing ARM compilers than the rest of the ARM ecosystem combined. Lots of companies benefit from sharing Linux and FreeBSD ARM development costs. This works because ARM severely restricts what you can do with an ARM core. You can't add custom instructions and you're increasingly required to use the same PIC interface and so on. This means that code written for vendor A's ARM cores will work with vendor Q's ARM cores. One of the big things that killed MIPS was their failure to enforce this. Every MIPS core had custom extensions and porting Linux + GCC from one MIPS vendor's core to another's was a fairly painful experience. Not only does RISC-V make this kind of fragmentation easier, the current structure of the ecosystem makes it inevitable.

Slashdot Top Deals

What we anticipate seldom occurs; what we least expect generally happens. -- Bengamin Disraeli

Working...