Slashdot is powered by your submissions, so send in your scoop


Forgot your password?
Slashdot Deals: Deal of the Day - Pay What You Want for the Learn to Code Bundle, includes AngularJS, Python, HTML5, Ruby, and more. ×

Comment No it does not compete with Skylake, those are GPU (Score 4, Informative) 84

The "deep learning" benchmark is a GPGPU workload which does practically nothing on CPU.

Nvidia has just made a SoC Chip that has about equally fast iGPU than what Intel has, for a lower energy consumption.

But in CPU performance, the Skylake is MUCH faster.

Comment RAM latency is not getting much faster (Score 4, Informative) 92

The latency of RAM is improving very slowly, only something like 2x-4x improvement in last 20 years.

Only the bandwidth of the memory is growing faster, and that's just because they have been putting more dram cells in parallel, always doing bigger data transfers and having faster memory bus.

Same is true for hard disk drive speed, the rotation speeds dictates the random access latency and the rotation speed of average hard disk has only gone up from 4200 or 5400 to 7200 rpm in the last 20 years, meaning only 1.7 or 1.33 times improvement in random access latency

  Though replacing hard disks with flash-based SSD storage has improved latency by a huge margin.

Comment Jets are much slower than A-10 bullets (Score 5, Informative) 502

The A-10 flies at about 420 MPH. Even 1980s fighter jets fly at mach 2, about the same speed as the bullets from the A-10 gun. An A-10 going after a fighter is literally the same ratio as a scooter going after a Ferrari.

Don't misunderstand, scooters are good. They are useless for chasing down sports cars, and an A-10 is just as useless for engaging enemy fighters. The fighters would (and do) fly by as if the A-10 is standing still.

Actually, even fighters from 1950's can fly at mach 2, BUT:
Even those 1980's fighters won't be flying at mach 2 at 95% of their time. They can only fly at mach 2 at high altitudes on straight line, full afterburner, wasting huge amount of duel.

Practically all dogfights happen at subsonic velocities. When you start doing high-g manouvers the velocity drops to subsonic very quickly.

> no known aircraft can survive the A-10's gun. It is the most powerful dogfight cannon

The bullets from the A-10's gun go about the same speed as the fighter. So if somehow, magically, the A-10 got on the fighter's tail and fired, the bullet probably couldn't catch up to the fighter. If it was fired off angle, it might hit the fighter at 30 MPH relative speed - not enough to dent the sheetmetal.

Survive that A-10s gun? No jet fighter in the last 40 years can be HIT by the A-10 gun unless the fighter is either a) parked or b) intentionally flying toward the A-10 without shooting it down.

This part is so incorrect....

The speed of bullets from GAU-8 is 1070 m/s.
Top speed of the worlds fastest jet fighter(mig-25) is ~890m/s flying on straight line on high altitude, with afterburner, but only ~333 m/s on low altitude.
Top speed of most modern jet fighters is in the class of 700m/s. (high, straigt line, full afterburner)

Common speed of modern jet fighters during dogfight is about 250-350m/s , 3-4 times slower than the bullets from GAU-8.

A-10 is actually quite good plane for shooting down slow low-flying aircrafts such as helicopters. It can use AIM-9 missile from slightly longer range, and from the close range the GAU-8 is very deadly. And because it can fly lower and slower it can more easily hit those slow low-flying targets than faster, higher-flying aircrafts can.

Comment Re:Unfair T/W ratio and wing loading comparison (Score 5, Insightful) 732

So what you're saying is that the plane is incapable of dogfighting unless you throw away one of the design requirements?


Here is an example. The numbers are from hat, not actual numbers.

You go to fight 600 miles away. You load the F-35 to it's full internal fuel load. When you arrive to the fighting location, you now have 50% fuel in your tanks, and you have the same T/W and wing loading ratios than in the report.

You also go to fight 600 miles away in F-16. You load it's internal tanks full, AND add two drop tanks. When you arrive to the location, your external tanks are empty, and you drop then. You are then fighting with full internal fuel load. Now, your real-world performance is WORSE than the numbers in the report, because you are fighting with full fuel tanks instead of half fuel tanks, and the report used fuel tanks that were half empty, half full.

Or, you go to fight 300 miles away. You load F-35 to half of it's internal fuel load. When you arrive the fighting location, you have 25% fuel in your fuel tanks, and you have much better T/W and wing loading ratios than in the report, as the report fuel tanks that were half empty, half full.

or, you load F-16 to it's full internal fuel load, and when you arrive to the fighting locaiton, you have 50% fuel left in your tanks, and you have the same T/W and wing loading ratios than in the report.

In all real-world cases, you have have smaller relative amount of fuel in your fuel tanks in F-35 than in F-16, and the numbers will shift in favour of F-35.

The design requirements say that F-35 has to fly a long distance with internal fuel, and that's just to make it stealthy, but not needing to use external fuel tanks.

Comment Unfair T/W ratio and wing loading comparison (Score 5, Interesting) 732

So they did their thrust/weight and wing loading comparison by loading all jets with 50% of internal fuel.

This comparison favours planes with small internal fuel tanks.

F-35 has huge internal fuel tanks, it can fly much longer with internal fuel than most other jet fighters (which need external fuel tanks, which are NOT calculated in these numbers) to fly as far.

Load all jets with amount of fuel that makes them fly about equally far and the numbers switch considerably, on favour of F-35.

Comment This article is ignoring Micron Automata (Score 1) 112

Micron Automata can solve NP-hard programs very quickly, and it's not quantum computer.

It abandons the Von Neumann model we have been using or last 60 years and can achieve very high parallelism.
And it requires a very different style of programming.

But it's not quantum computer. And it's actually working, running in Micron's labs and very soon coming to market.

Quantum computers are hype that's not really working, Micron automata is the real thing achieving mostly the same benefits.

Comment Wirth's law protects us from singularity (Score 3, Interesting) 181

There will never be enough processing power to create powerful enough AI that singularity will happen.

Wirth's law states that software gets slower due bloatness faster than Moore's law allows hardware to get faster.

We have moved from handcoded assemly and simple binary data format format to javascript which is either interpreted very slowly, or JIT-compiled into slightly faster code which is still 10 times slower than assembly, and XML or JSON-based data formats (which require a LOT of parsing). Now other languages are being complied to javascript, which adds another slowness layer on top of it.

So, it we invented a super-powerful AI that would be capable of creating truely smart code, it would spend it's time creating even more bloaty abstraction layers on top of each others, instead of creating anything that would be truely more intelligent.

Comment Re:Microcode switching (Score 2) 161

This same myth keeps being repeated by people who don't really understand the details on how processors internally work.

Actually, YOU are wrong.

You cannot just change the decoder, the instruction set affect the internals a lot:

All the reason you list could all be "fixed in software".

No, they cannot. OR the software will be terible slow , like 2-10 times slowdown.

The fact that silicon designed by Intel handles opcode in a way a little bit better optimized toward being fed from a x86-compatible frontend is just specific optimisation.

Opcodes are irrelevant. They are easy to translate. What matters are the differences in the semantics of the instructions.
X86 instructions update flags. This adds dependencies between instructions. Most RISC processoers do not have flags at all.
This is semantics of instructions, and they differ between ISA's.

Simply doing the same stuff with another RISCy back-end, i.e: interpreting the same ISA fed to the front-end, will simply require each x86 ISA being executed as a different set of micro-instructions. (some that are handled as single ALU opcode on Intel's silicon might require a few more instruction, but that's about the different).

The backend, the micro-instrucions in x86 CPUs are different than the instructions in RISC CPU's. They differ in the small details I tried to explain.

You could switch the frontend and speak a completely different instruction set. Simply if the two ISA are radically different, the result wouldn't be as efficient as a chip designed with that ISA in mind. (You would need a much bigger and less efficient microcode, because of all the reasons you list. They won't STOP intel from making a chip that speaks something else.

Intel did this, they added x86 decoder to their first itanium chips. And. They did not only add the frontend, they added some small pieces to their backend so that it could handle those strange x86 semantic cases nicely.
But the perfromance was still so terrible that nobody ever used it to run x86 code, and then they created a software translator that translated x86 code into itanium code, and that was faster, though still too slow.

Not only is this possible, but this was INDEED done.

There was an entire company called "Transmeta" whose business was centered around exactly that:
Their chip, the "Crusoe" was compatible with x86.
- But their chip was actually a VLIW chips, with the front-end being 100% pure software. Absolutely as remote from a pure x86 core as possible.'

The backend of Crusoe was designed completely x86 on mind, all the execution units contained the small quirks in a manner which made it easy to emulate x86 with it. The backend of Crusoe contains things like:

* 80-bit FPU,
* x86-compatible virtual memory page table format(one very important thing I forgot from my original list couple of posts ago; Memory accesses get VERY SLOW if you have to emulate virtual memory)
* support for partial register writes(to emulate 8- and 16-bit subregisters like al, ah,ax )

All these were made to make binary translation from x86 easy and reasonable fast.

Comment Re:This is a myth that is not true (Score 4, Informative) 161

Some of what you said is legitimate. Most of it is irrelevant, since it does not speak to the postulate. You're speaking of issues which will affect performance. So what? You'd have a less-performant processor in some cases, and it would be faster in others.


1) if the codition codes work totally differently, they don't work.

2) The data paths needed for separate and compined FP and integer regs are so different that it makes absolutely NO sense to have them together in chip that runs x86 ISA, even though it's possible.

3) If you don't have those x86-compatible address calculation units, you have to break most of memory ops into more micro-ops OR even run them with microcode. Both are slow. And if you have a RISC chip you want to have only the address calculation units you need for your simple base+offset addressing.

4) In the basic RISC pipeline there are two operands, one output/instruction. There are no data paths for two results, you cannot execute operations with multiple outputs such as x86 muliply which produces 2 values(low and high part of result), unless you do something VERY SLOW.

6) IF your RISC instruction set says you have aligned memory operations, you design your LSU to have only those, as it makes the LSU's much smaller, simpler and faster. But you need unaligned accesses for x86.

9) If your FPU calculates with different bit width, it calculates wrongly.


Comment Re:isn't x86 RISC by now? (Score 1) 161

After AMD lost the license to manufacture Intel i486 processors, together with other people, they were forced to design their own chip from the ground up. So they basically used one of the 29k RISC processors and put an x86 frontend on it.

This was their plan, but it ended up being quite much harder than they originally thought, and K5 came out much later, much different and much slower than planned. There are quite a lot of thigns that have to be done differently (some of them are explained in my another post)

Comment This is a myth that is not true (Score 5, Informative) 161

That is correct. Every time this comes up I like to spark a debate over what I perceive as the uselessness of referring to an "instruction set architecture" because that is a bullshit, meaningless term and has been ever since we started making CPUs whose external instructions are decomposed into RISC micro-ops. You could switch out the decoder, leave the internal core completely unchanged, and have a CPU which speaks a different instruction set. It is not an instruction set architecture. That's why the architectures themselves have names. For example, K5 and up can all run x86 code, but none of them actually have logic for each x86 instruction. All of them are internally RISCy. Are they x86-compatible? Obviously. Are they internally x86? No, nothing is any more.

This same myth keeps being repeated by people who don't really understand the details on how processors internally work.

You cannot just change the decoder, the instruction set affect the internals a lot:

1) Condition handling is totally different on different instruciton sets. This affect the banckend a lot. X86 has flags registers, many other architectures have predicate registers, some predicate registers with different conditions.

2) There are totally different number of general purpose and floating point registers. The register renamer makes this a smaller difference, but then there is the fact that most RISC's use same registers for both FPU and integer, X86 has separate registers for both. And this totally separates them, the internal buses between the register files and function units in the processor are done very differently.

3) Memory addressing modes are very different. X86 still does relatively complex address calculations on single micro-operation, so it has more complex address calculation units.

4) Whether there are operations with more than 2 inputs, or more than 1 output has quite big impact on what kind of internal buses are needed, how many register read and write ports are needed.

5) There are a LOT of more complex instructions in X86 ISA which are not split into micro-ops but handled via microcode. the microcode interpreter is totally missing on pure RISCs ( but exists on some not-so pure RISC's like Powe/PowerPC).

6) Instruction set dictates the memory aligment rules. Architectures with more strict alignment rules can have simples load-store-units.

7) Instruction set dictatetes the multicore memory ordering rules. This may affect the load-store units, caches and buses.

8) Some instructions have different bitnesses in different architectures. For example x86 has N x X -> 2N wide multiply operations which most RISC's don't have. So x86 needs bigger/different multiplier than most RISCs.

9) X87 FPU values are 80-bit wide(truncated to 64-bit when storing/loading). Practically all the other CPU's have maximum of 64-bit wide FPU values (though some versions Power have support for 128-bit FP numbers also)

And on the seventh day, He exited from append mode.