Alpha 21364 EV7 Specs Released 174
Jon Carroll writes " HP has revealed their Alpha roadmap
today at RDF and the schedule goes
as previously planned. Alpha 21364 (EV7) is based on 0.18 micron to be shipped
by this year end and EV79 based on 0.13 micron SOI will be up next. EV7 will be
at 1.2Ghz while EV79 will be at 1.6Ghz. The Alpha 21364 EV7 chip will have 152M
transistors, 1.75MB integrated on-die L2 cache, 32GB/s of network bandwidth,
integrated RDRAM memory controller with 8 channels up to 12.8GB/s of memory
bandwidth. "
Yay (Score:2, Funny)
Re:Q.E.D. (Score:1, Offtopic)
alpha still lives? (Score:3, Interesting)
Re:alpha still lives? (Score:4, Informative)
Re:alpha still lives? (Score:1)
Re:alpha still lives? (Score:1)
Will they continue the Alpha line or will this be where it ends?
You forget Samsung (Score:2)
Re:alpha still lives? (Score:2)
THE ALPHA IS DEAD! LONG LIVE THE ALPHA!
barf, RDRAM (Score:1)
Re:barf, RDRAM (Score:3, Informative)
Not really Re:barf, RDRAM (Score:3, Informative)
Peter
Re:barf, RDRAM (Score:1)
I'm no fan of RDRAM though. Not that I necessarily dislike the technology, but the tactics.
Re:barf, RDRAM (Score:5, Informative)
It does in a PC, where they only put two 16-bit channels so you need two accesses to each bank to fetch the 64-bit bus-width (it's serialization).
In Alpha, there's no serialization. You've got an eight-channel (16 bit each, unless they use the newer 32-bit wide?) configuration. That means that they are 128 bits wide. In order to get the same performance from DDR, you'd need to have a bus that's 1024-bit wide or something like that, which is not practical...
I don't like RAMBUS at all, but the industry has to come up with something faster because it's clearly the fastest on platforms where it's used correctly (I don't include the current PC in that category).
Re:barf, RDRAM (Score:3, Informative)
And back to your point about economics of RDRAM, there is money out there that will pay a premium for performance scalability (at least when combined with reliability). About 11 percent of all servers -- command as much as 60 percent of all server revenue. [eetimes.com]
I just wonder how it'll stack up performance-wise on this chart [theinquirer.net] versus Power4 and Itanium2.
But the main reason I suspect one would buy one of these is because you want binary compatibility with all your old high-performance Alpha code that you invested so many man-years in.
--LP
Re:barf, RDRAM (Score:2)
How sad... (Score:5, Insightful)
Alpha is brilliant, too bad it didn't receive the development and marketing dollars it deserved. Compaq should be ashamed.
Thank goodness AMD is here to take up the slack with Hammer! =)
Re:How sad... (Score:1)
Re:How sad... (Score:1)
Re:How sad... (Score:1)
Itanium is going to take years to reach critical mass. x86-64 is going to be eating its lunch for quite a good while. Especially if Intel dont hurry up and make a version suitable for desktop/workstation use.
Re:How sad... (Score:2, Insightful)
To take up the slack? How could a glorified x86 chip with a broken/inefficient instruction set possibly be better than a chip with a new from-scratch architecture.
Re:How sad... (Score:5, Insightful)
Well you have the x86 with basically all the market forces behind it driving huge R&D budgets...that's how the x86 managed to slam the MIPS, SPARC, POWER, and pretty much all the other RISC chips. It doesn't matter that you are basically sticking solid rockets onto a large not-so-aerodyanic brick. It flys.
That's the past. Now in the present we have the same market forces behind the x86, and a stunningly bizzare new creature called the IA64, which may not be the poster child for "broken/inefficient", but is clearly a great one for "will drive compiler writers over the brink into the spinning abyss of madness". It is definitly stunningly hard to write things for, that's for sure. More so in most cases then figuring what "RISCops" your x86 instructions are broken into, and where they are shoved, how long that takes, and what a better set would be.
Intel will send you the IA64 instruction set manuals for free. Go take a peak...if your mind is strong. Or you don't mind a bit of gibbering.
The Hammer is NOT a good thing... (Score:2, Interesting)
So the x86 architecture/instruction set still has a great deal of commonality with the Altairs running CP/M.
The 'x86' architecture was only intended to be used for a few years. IBM first extended it from the Altair (8085, 64k) to the PC (8086/8, 1M). The popularity of the PC lead to the decision to extend the PC to the AT (80286, 16M). After that, IBM decided that the architecture needed replacement and then tried to kill it. IBM created an entirely new, superior architecture, complete with a new, superior OS. (The PS/2 and OS/2).
This failed miserably. (Not in small part to the fact it was a 'closed' architecture-- just like Macintosh)
Instead most of the world chose to stay with the 'x86' architecture (and the more economical clones), maintain backwards compatibility, and deal with its limitations. (I won't say flaws, because the original architecture was never meant to be extended this far to begin with. Of course, that was back with the 8080 and 8085, 64k (max) memory, the Altair, and CP/M.
And now, the x86 architecture is one extension upon another, finally arriving at the monstrosity we know today.
The Hammer (and Intel's 64-bit extension to the Pentium... NOT the Itanium) will be yet another generation of an architecture originally intended to handle no more than 64k of memory.
It's sick; the best comparison I can think of is if the 'x86' architecture is compared to bare hands, the only tools we have are gloved hands with speed/power assist. No wheel, no lever -- just hands.
The sooner we kill the x86 architecture, the better. It was ancient 15 years ago. Humanity gave up horses and slaves in favor of automobiles and machinery. We can give up the old x86 architecture for something better. Maintaining it is inhumane.
But getting Intel, AMD, and others to cooperate (and share valuable, patented technologies with each other) is like asking Microsoft to GPL the source for Windows.
Re:The Hammer is NOT a good thing... (Score:3, Insightful)
Now, the problems with the x86 instruction set that have been embellished out of proportion have all been basically taken care of over the years and fixed, but the bashing still continues. It continues mostly from people who aren't aware the problems have been fixed because they are simply bashers for the sake of bashing. One deficiency about the x86 was its integer register set: it was too small, only 8 general purpose integer registers, and in some of the more complex instructions, only specific GPRs could be used. This has been taken care of by the x86-64 instruction set, they doubled the registerset to 16, and these registers are truly general-purpose. Then there is wierdness about the stack-based floating-point unit: again this has been taken care of because they are using SSE for floating point which uses random-access floating point registers rather than stack-based. Still there were some advantages to using the stack-based FPU, such some of the complex floating point instructions you got with it, such as tangents, sines, cosines, logarithms, etc.
Now, your knowledge of PC history is woefully inaccurate. When IBM tried to make its powergrab with the PS/2 hardware and OS/2 software architectures, it wasn't trying to get rid of the x86 instruction set. On the contrary, it was getting much deeper into x86 than at any point in its past. With PS/2, it had tried to change away from the ISA bus towards a new generation bus called MCA, without accomodating the existing ISA bus; the shift away from ISA wouldn't successfully take place until many years later when they introduced the PCI bus, which maintained backwards compatibility with ISA. PCI was successful because it allowed a gradual transition away from ISA, MCA on the other hand tried to force everyone to switch away completely all at once. The OS/2 operating system was a similar story, it actually tried to use the x86 architecture in greater depth than any OS previously, by using the 286's new "Protected" operating mode, which gave access to much larger amounts of memory. The only problem was that the 286's Protected mode was not yet full featured, it was more of a running experiment, and it wouldn't become truly useful until the 386 came along and added all kinds of features to Protected mode that allowed for greater flexibility and backwards-compatibility at the same time.
Re:The Hammer is NOT a good thing... (Score:1)
PCI is not compatible in any sense with ISA. I think you're thinking of EISA, which was the industry response to IBM's attempt to corner the market by patenting various aspects of MCA so that no one else could make compatible devices or systems.
Another thing he got wrong was that the 8086 chip was not backwards compatible with the 8080 line. It was similar in architecture (limited non-orthogonal register set, awkward instruction set), and there were 8080 -> 8086 cross-assemblers (sometimes producing more than one 8086 instruction for each 8080 instruction), but it wasn't backwards compatible in the same way as 8086 -> 80186/286/386/486/Pentium were.
Wow, a whole 16 general-purpose registers, my heart flutters. Bah, might as well use a 64-bit extension to the 6502.
Re:The Hammer is NOT a good thing... (Score:1)
Re:The Hammer is NOT a good thing... (Score:2)
Funny... I seem to remember most RISC processors I've known (or designed) to have at least 32 GPR's.
Besides... The point of moving away from CISC is so a processor doesn't use over 1/2 its transistors just to decode the instruction. The instruction decode section of the pipeline shouldn't be the single most complex part; unfortunately on a CISC processor, that's where ~50% of the transistors are.
I'm also fully aware of the 'evolution' of PC architecture. I've been programming x86 asm for quite a while as well. Many of the x86 (even modern ones) ways of doing things are just... inelegant (or ugly)
Re:The Hammer is NOT a good thing... (Score:4, Insightful)
The sooner we kill the x86 architecture, the better. It was ancient 15 years ago. Humanity gave up horses and slaves in favor of automobiles and machinery. We can give up the old x86 architecture for something better. Maintaining it is inhumane.
This is a silly argument, for two reasons.
First, almost all programmers can (thankfully) ignore the underlying instruction set and program in a higher level language - therefore it is irrelevant. x86-64 is actually quite an improvement over IA32 regardless.
Second, if an instruction set is sufficiently efficient to allow the processor to be the fastest microprocessor in the world, it can't be so bad - can it? If my information is correct, Hammer and Opteron will debut with absolutely world-class performance. This isn't so surprising, given that many ex-Alpha engineers are working on it.
Backwards compatibility is simply a nice bonus, which will be crucial in Hammer attaining critcal mass quickly.
Time to pick up some AMD stock!!! =)
Re:The Hammer is NOT a good thing... (Score:1)
The main thing wrong with x86 backwards compatibility isn't that the machine code is awkward; you're absolutely right that if you can make it run fast, who cares? One problem is that it makes it difficult to run fast, so it would run faster without the cruft. However, the biggest problem is that it encourages manufacturers to continue producing machines that are basically the same crap as we've always had. IDE, lousy serial ports, the parallel port for gosh sake, the same lousy BIOS architecture, ISA ports and IRQs. The PC world really needs to take the plunge the way Apple did - use a decent boot architecture (hey, maybe they could use Open Firmware!), drop serial ports, go to FireWire/USB. Apple's only mistake was justifiable, going to IDE (due to the ridiculous price differential between SCSI/IDE drives, which was due to a self-perpetuating cycle of being more expensive because it wasn't as widely used).
Re:The Hammer is NOT a good thing... (Score:1)
Hey before you get too high on your anti-establishment high-horse, check some of those facts first.
IDE is now in use by such high-end server/workstation makers like Sun Microsystems, HP PA-RISC, etc., who use them in their personal workstation lines, because it makes absolute sense both in terms of economics and performance to do so. A workstation just requires a boot disk, and maybe some personal storage space, and IDE does this extremely well. Most large-scale data storage can and should be accomplished off of network-attached or SAN-attached storage devices.
Don't know what you're complaining about those IRQ's, all systems in the world have something similar in concept to IRQ's. And in fact most systems throughout the world are now standardized on PCI, so they use the same IRQ mechanism as PC's.
And what about them "lousy serial ports"? That's absolutely essential in maintaining control over large groups of Unix servers. Their consoles are invariably serial-port based. They do have nice modern GUI consoles, but when it comes to stacking them into a server room and controlling them all from a single input/output source, nothing beats the simplicity of a serial console tty device. And since they're X Window or Java based, you can simply do all of your graphical stuff from the comfort of your own PC logging in remotely, but the local administration can be done over non-graphical serial ports.
Re:The Hammer is NOT a good thing... (Score:2)
Exactly true. Although the number and arrangement of the interrupts may be different. I would prefer not to think of how dog slow computers would be if they had to actively poll system devices (from video cards to keyboards). It's sooo much nicer to use an interrupt system.
And what about them "lousy serial ports"? That's absolutely essential in maintaining control over large groups of Unix servers. Their consoles are invariably serial-port based. They do have nice modern GUI consoles, but when it comes to stacking them into a server room and controlling them all from a single input/output source, nothing beats the simplicity of a serial console tty device. And since they're X Window or Java based, you can simply do all of your graphical stuff from the comfort of your own PC logging in remotely, but the local administration can be done over non-graphical serial ports.
While not arguing this point in the least, I will say one thing: The way the serial ports are set up on the x86 is a bit messy. The Unix boxen I've worked with had a more elegant system for serial ports. (Although most of them also didn't have the same backwards-compatibility problems x86 has).
Re:The Hammer is NOT a good thing... (Score:2)
Oooh! A higher level language!!!
So is BASIC! And you can get it for any platform and your code will run.
Whoopee! It's still dog slow and takes up more resources than is necessary to get the job done. Even compiled (C) code usually runs several times slower and requires more memory than assembler.
Second, if an instruction set is sufficiently efficient to allow the processor to be the fastest microprocessor in the world,
First, an instruction set has little to do with the speed of the processor. The whole CISC vs. RISC thing has more than shown that. An instruction set has more to do with the difficulty and/or complexity of the processor's design. The CISC instruction set requires more (electrical) power, and more transistors to do the same job.
Second, it's to be the fastest in the world? By what method is this measured? Clock speed? Size of the pipeline? Number of pipelines? Clocks per (integer, float, or instruction)?
The hammer isn't even meant to compete with workstation processors in terms of speed. I'll take a SPARC or Itanium any day. (It's a sad thing that so many seem to forget that the Itanium is an HP design, the successor to its PA-RISC, and that newer versions of the Itanium will include many of the Alpha's technologies).
Re:The Hammer is NOT a good thing... (Score:2)
Oooh! A higher level language!!!
So is BASIC! And you can get it for any platform and your code will run.
Whoopee! It's still dog slow and takes up more resources than is necessary to get the job done. Even compiled (C) code usually runs several times slower and requires more memory than assembler.
You've just proven you have no practical knowledge of software development. Far less than 1% of desktop/workstation/server software is programmed in assembler. Perhaps the inner loop of some game engines might be, but I doubt even that in most cases.
One of the main points of developing faster processors with large amounts of memory was to enable the use of more programmer-friendly languages. It is simply not worth the cost to develop systems of any size in assembly.
Finally, if you think C code "usually" runs several times slower than assembler, you're just plain out to lunch.
Second, if an instruction set is sufficiently efficient to allow the processor to be the fastest microprocessor in the world,
First, an instruction set has little to do with the speed of the processor. The whole CISC vs. RISC thing has more than shown that. An instruction set has more to do with the difficulty and/or complexity of the processor's design. The CISC instruction set requires more (electrical) power, and more transistors to do the same job.
The instruction set (and associated issues like register count) certainly does have an effect on speed. Next!
Second, it's to be the fastest in the world? By what method is this measured? Clock speed? Size of the pipeline? Number of pipelines? Clocks per (integer, float, or instruction)?
I'll settle for SPEC2000 benchmarks. You know, real world codes optimized to the hilt for the target processor.
The first Hammer is supposed to debut at a PR 3400, and the first Opteron with a PR 4000. Multiply the current Athlon SPEC scores by the ratios of the PR numbers...that should give you a good idea of what's to come.
The hammer isn't even meant to compete with workstation processors in terms of speed. I'll take a SPARC or Itanium any day.
You are absolutely incorrect. First off, Athlon MP and Xeon are already workstation solutions, albeit 32-bit.
Secondly, the Opteron versions of Hammer (with dual memory controllers, more than 2-way capability and large cache) are squarely aimed at high-end workstation and server applications, up to at least 8-way. Do some homework and you'll see this to be the case. Dell recently announced that it's skipping Itanium 2, and evaluating Hammer/Opteron.
(It's a sad thing that so many seem to forget that the Itanium is an HP design, the successor to its PA-RISC, and that newer versions of the Itanium will include many of the Alpha's technologies).
It is a dual HP + Intel design, and so far it has been a collosal dud by anyone's measure. With poor backwards compatibility and anemic performance, it is very vulnerable to Hammer, if AMD can pull it off. So far Hammer is looking great! Working silicon has been demoed, and things look on track for a 4Q release of the first Athlon-64 (desktop Hammer). Opteron will follow 1Q 2003.
(BTW, it'll be interesting to see how Intel spins the low clock speeds of Itanium. THAT will require some chutzpah! I hope AMD nails Intel on that score.)
Re:The Hammer is NOT a good thing... (Score:2)
If desktop computers accounted for more than a tiny fraction of the whole computer market, I might actually care about that statement. Fortunately, the vast majority of computers are embedded systems, and a substantial portion of embedded code is pure asm.
One of the main points of developing faster processors with large amounts of memory was to enable the use of more programmer-friendly languages. It is simply not worth the cost to develop systems of any size in assembly.
No, that's the software designers point of view. The hardware designers point of view is to maintain the performance of software written by overworked programmers who don't have the time to do it right.
Finally, if you think C code "usually" runs several times slower than assembler, you're just plain out to lunch.
First off, C code does execute several times slower than assembler. On the order of 5-10x is typical. Compilers really aren't that wonderful.
Re:The Hammer is NOT a good thing... (Score:2)
Er, wait a sec. This discussion was about Hammer, vis a vis Itanium, SPARC, etc. Remember?
None of them are aimed at the embedded market.
No, that's the software designers point of view. The hardware designers point of view is to maintain the performance of software written by overworked programmers who don't have the time to do it right.
Don't worry. Even with the widespread use of high level languages, computers can do far more now than a few years ago - or are you claiming you could run Quake 3 on a 386, if it were written in assembly? ;-)
First off, C code does execute several times slower than assembler. On the order of 5-10x is typical. Compilers really aren't that wonderful.
Except for pathological cases, C code will run a few percent slower than hand-tuned assembler - if that. As I said, out to lunch...
Re:The Hammer is NOT a good thing... (Score:2)
But all are (or will be used) in embedded system design anyway, so that's where my train of thought was leading. The Hammer mainly has 'momentum' going for it. Just about everything else is against it.
First, the Hammer is a design of a few orders of magnitude more complex than anything else ever attempted. The engineers at DEC dropped the VAX processors and designed the Alpha to avoid the same complexity issues the Hammer is trying to tackle.
First, it uses the x86 set, which has both more instructions, and more complexity (some would say features) per instruction than a pure RISC processor. About half the Athlon's design is just to decode the instructions it's given. After decode from x86 into its internal RISC structure, it then schedules the pipeline, and finally actually sends the data into the appropriate pipeline for execution. There is a huge amount of overhead just to decode what needs to be done.
Pure RISC designs use about 15% of the chip's transistors for decode, and that's if you include pipeline scheduling.
This is the crux of the problem for AMD's hammer. The hammer will be forced to use a much larger transistor count than its RISC competitors. The higher transistor count results in several problems: It's far more complex and expensive to design. It takes a more complicated and expensive process to fab. The die is larger, which results in a slower processor. And it uses more power.
Which means that while AMD may have some momentum going for it, the Hammer is far more costly to design and produce than its competition. This will make things very hard for AMD; espescially if Intel is able to use its (considerably greater resources) to get computer makers to move from x86 to IA-64 at the same time they move from 32 to 64-bit.
And since HP/Compaq, Dell, Gateway, Micron, and IBM have all thrown in with IA-64... Things look grim for the Hammer.
The good thing is that I would bet that the RISC back end to the hammer is designed so it can be mated with an IA-64 interface should the x86-interface core not take off.
or are you claiming you could run Quake 3 on a 386, if it were written in assembly?
Not exactly a fair comparison, given that 3D Acceleration was a rather expensive solution back in the days of the '386. Most of which were used for military flight sims. The accelerator was about the size of a refrigerator, and connected to the 'host' computer, which was usually a SPARCstation @ 33 MHz. (At least in the case of sims made by Evans & Sutherland, which had the market for them pretty much cornered)
However, I wouldn't be too surprised that the non-graphics portions of Q3A would run fairly well (but not great) on a '386 ( if it had a '387 FPU as well.).
I'll say this: Without a dedicated 3D card, it would take a Power4 module to tackle Q3A at max settings. (Of course, a Power4 module isn't a single processor-- it's 8 processor cores roughly analogous to a PowerPC. And just one of the current Power4 cores outruns AMD's 'best-case' specs for the Hammer (which is still in development).
Except for pathological cases, C code will run a few percent slower than hand-tuned assembler
More the opposite; except for pathological cases, C code runs a few hundred percent slower than assembler. (Although on the IA-64 architecture, this is not necessarily true, as it relies entirely on the compiler to explicitly state operation order. The IA-64 does not re-order operations (or do any pipeline scheduling) at all, which is one of the primary reasons the Itanium runs x86 code so slowly.
And, the IA-64 arch. is about the only one out there where a C compiled program stands a fair chance against pure asm, (since it requires the pipeline scheduling to be explicitly stated by the programmer, which is an extremely difficult task for mere mortals.)
Frankly, I'm not about to argue any further. The asm vs c/compiled is older than vi vs. emacs; except the 'vi vs. emacs' doesn't have much impact on the speed of programs written in it. I know for my own experience how much faster assembler is than Compiled languages. I stand by my numbers. So do all the hardware engineers I know, including a couple whom have Ph.D.'s in compiler design.
C is great because it compiles well, and is cross-platform. Asm doens't require THAT much more development time than C does... But ASM is so device specific that unless you're writing software for a driver or embedded devices, the advantage of C more portable nature outweighs ASM's speed.
I mean... think of what Carmack would give up if he wrote his graphics engines in asm: NO cross-platform capability, a nightmare interfacing with graphics card drivers, and almost no flexibility in the graphics engine.
And since the advantages of ID's graphics engines have always been broad platform and hardware support, and extreme graphics engine flexibility. He'd lose a significant part of his market if he wrote it in asm. Plus, other companies would have graphics engines that, while somewhat slower, would be available far sooner than an asm implementation.
Re:The Hammer is NOT a good thing... (Score:2)
They are years (possibly decades) away from widespread use as embedded processors. Given their capability and memory sizes, again it will be far less than 1% programmed in assembler. Mostly likely Java will be the dominant embedded language by then.
I don't have time to rehash this entire argument, but I will touch on one point:
This is the crux of the problem for AMD's hammer. The hammer will be forced to use a much larger transistor count than its RISC competitors. The higher transistor count results in several problems: It's far more complex and expensive to design. It takes a more complicated and expensive process to fab. The die is larger, which results in a slower processor. And it uses more power.
Hammer has a substantially smaller die than P4, it's main competitor. Itanic, er Itanium, not only has a large die, but is priced with extreme margins. It's an easy target for Hammer. There is the issue of OEM support, but if Hammer meets spec it will be in high demand.
More the opposite; except for pathological cases, C code runs a few hundred percent slower than assembler.
Why don't you go to the Usenet group comp.compilers and state that "Except for pathological cases, C code runs a few hundred percent slower than assembler".
The resulting blood bath should be amusing. ;-)
Let me know if you do it, I want to watch...
Re:The Hammer is NOT a good thing... (Score:2)
Again, that's comparing a fab tech that is in the near-future compared to one that's been used for over a year. Not a fair comparison by any means. It's like saying that the Athlon has a smaller die than the K6. Completely different chip generations.
And since both Intel and AMD are working together (with about every other semiconducter maker) on researching new fab techs, you can bet Intel will have the same fab tech of the Hammer. (I do know the Itanium II uses a 0.09 micron fab tech, which is unprecedented for the scale.)
Supposedly the Itanium was (more or less) a rushed release (similar to the PowerPC G4). The Itanium II seems to have improved by a few orders of magnitude in efficiency, as well as speed. For that matter, the PowerPC G5 (which is not being rushed out) specs about 2x faster than IBM's Power4 core.
And, remember, as I said before, the Itanium is currently targeted at the Workstation/high-end server market; NOT the PC market. When I say workstation, I mean "ultra-high performance, ultra-high stability (and typically, ultra-high cost)" market. The Itanium is priced similarly to the primary competitors in the arena, those being UltraSPARC, Power, Alpha, and PA-RISC. The first-gen Itanium is not (and was never intended) to be anywhere near your local conumer electronics store (or your local system builder, for that matter).
The Athlon MP is not real workstation class by any stretch of the imagination. No competant engineer even trusts the architecture with critical tasks. I have yet to see anybody design computer hardware (or vehicles, or perform complex simulations, scientific calculations, or true enterprise-level work) on x86 hardware. The hardware, while cheap, still crashes far, far too often... it doesn't have anywhere near as good of a memory (and system) architecture... the list goes on and on.
The reason PC's are used for 'render farms' are because they're so cheap. If a computer crashes, then they just have to re-boot it and re-render the current frame (losing only a few hours work at most, and even then in a relatively non-critical task).
To be short: Sun dominates the workstation market, followed by HP, IBM, and SGI. None of their workstations (with exceptions to SGI's lowest-cost graphic workstations), run x86. That's over 95% of the workstation market.
There is the issue of OEM support, but if Hammer meets spec it will be in high demand.
Unquestionably. However, that doesn't mean it will be successful. AMD once made the world's most popular RISC processor (hands down). It literally blew everything else away in terms of sales. AMD discontinued production because, in spite of very high demand for the hardware, they couldn't come close to competing with the other architectures (or, to be more specific, although hardware makers loved it, nobody wrote software for it.)
If the Hammer isn't compatible with IA-64 compiled binaries, then AMD will have to fund the development of Hammer-compiled versions, as software developers, following the money, will support IA-64 first. AMD has done this in the past already, but had to give up because it wasn't profitable. (Not coincidentally, it's the same RISC processor that was in such high demand that was the source of this headache).
Why don't you go to the Usenet group comp.compilers and state that "Except for pathological cases, C code runs a few hundred percent slower than assembler".
The resulting blood bath should be amusing.
As I said, the assembler vs compiled fight is quite long running. Stating asm vs compilers arguments in a compiler newsgroup would get a similar response to a windows user extolling the virtues of WinXP in a Mac (or linux) group.
And, unsurprisingly, stating that C is anywhere near as efficient as pure asm in an assembly newsgroup would be a bloodbath as well.
The main argument for using C is that it is generally faster software development, and generates code that is 'acceptable'.
Pure asm takes more time to develop, but results in significantly tighter/faster code. The L4 microkernel kernel is a great example of this: The C implementation is much slower than the asm implementation.
But, unsuprisingly, the C implementation is a bit easier to work with.
HP did some research a while back (2-3 years) with software optimisation. They discovered a few interesting things: They could 'emulate' (using full architecture emulation) compiled programs with ~5-15% greater performance than running the same binary natively. (The emulator was emulating the PA-RISC architecture, and ran on top of PA-RISC hardware, so the test was conducted on the same machine) The emulator was capable of making up for inefficiencies the compiler added into the code. In spite of the (large) overhead of the emulation, the program still ran faster while emulated.
While it has yet to really see more than tech demo releases, the Amiga OS4 technologies are quite similar: They are able to run the exact same binary on multiple platforms (PowerPC, x86, IA-64, SPARC, and MIPS) with no drop in performance (compared to natively-compiled versions of the same code). Again, this is due to (current) compiler problems. (PS- I'm not an Amiga fan per se, but I do admire how well-engineered they were for their day)
Another good example is the speed difference between different compilers on the same platform. If they compiled to anything remotely close to the speed of asm, then there wouldn't be a 15-20% speed difference between a highly specialized compiler (such as Intel's) versus a more generic (cross-platform) compiler (such as gcc).
Don't get me wrong: There's nothing really wrong with using a compiled (or interpreted) language. There are very definate benefits to their use (development and maintenance time being primary considerations). Compiled languages are acceptably fast, and compilers are getting steadily better.
But I doubt we'll ever see more than a fraction of embedded devices use a more high-level language. A price difference of $0.01 adds up to real money (and reduced cost) in commercial production runs.. In addition, even where compiled languages are used, the resulting code is still de-compiled and the results scrutinized closely. (Which isn't much different than just writing the whole thing in asm anyway).
Re:The Hammer is NOT a good thing... (Score:2)
That's not too suprising... I'd say the figure is about right. With as large an instruction decode stage as an x86 (or any CISC) has, changing from 32 to 64 bits isn't going to change the size of the chip much. (The 64-bit extensions, from what I understand, do not add more than a couple instructions; it simply reuses the ones it already has. Hence Decode stage won't grow too much)
The thing is the Decode stage takes up so much of the overall die (and number of transistors, etc) in any CISC processor, that even sweeping changes in the remainder of the chip will result in a nearly identical die size.
That being said, the actual RISC processing core (of the Hammer) is significantly larger than the K7's RISC core. (On the order of 20-30%). It's just that the decode stage is so huge that it hardly makes any difference.
Why do you think people are so excited about Hammer?
A couple of things: First, there is a significantly large anti-Intel crowd. (Not surprisingly, they're also anti-Microsoft). So any upcoming non-Intel chip is exciting to them.
My feelings as to 'why AMD?' comes down to a simple factor: Price. AMD chips are loved by so many because they're cheap x86-compatibles (games being a key factor). If Apple hardware were similarly priced, and had the game market that x86 offers, Apple (and PowerPC) would be a favorite.
Processors can be related to cars fairly well, as long as you forget about being compatible with Windows for a moment; And frankly, as far as I'm concerned, the programs that run on it don't make a difference to the actual hardware.
The Hammer is akin to a pickup truck: A fairly inexpensive, medium-quality vehicle. It's loved because it does its job at a bargain price. It's utilitarian. It's the 'people's truck', and is affordable to most of the population.
Workstation processors (Such as Power, SPARC, Alpha, PA-RISC, Itanium) are compared to a semi-truck (Kenworth, International, Caterpillar): They don't necessarily go any faster, but they can tow huge cargos, but the corresponding rise in cost is far from linear.
And Apple (PowerPC) processors are BMW's or an Audi: They don't really run any better (or worse) than a pickup truck-- but it's a higher-quality 'luxury' car, and gives a better ride. You pay for the quality and experience, though.
And, basically, there are a lot of people who are perfectly happy with their pickup truck. They're not about to pay more (at a very uneven scale) for more performance of a semi-truck, nor do they care for the luxury of a BMW.
(And, the Itanium isn't as great as the other workstation processors, but it's also the only 1st gen chip in the bunch; The 1st gen SPARC, Power, and PA-RISC processors weren't wonderful either.)
The Itanium also has one major problem with reguard to die size: It's binary compatible with both x86 and PA-RISC processors; meaning that while the pure IA-64 architecture part of the chip is smaller than the Hammer, it then has the circutry to decode x86 (which is a huge # of transistors, and hence, huge die area), PA-RISC (a much simpler/smaller addition to the x86 decode), and the IA-64's own VLIW decode.
If the hammer had three seperate instruction decoders (one CISC, one RISC, one VLIW), then it would have a huge die area too. But the Hammer has one (CISC). And even the Athlons would be half their current size if they were pure RISC rather than CISC. (Of course, they wouldn't be x86 compatible then, but that's markets for ya.)
The 64-bit extensions don't comprise an entirely new instruction set, primarily because they're just that: extensions. The Hammer's mechanism to extend from 32 to 64 bits is identical to the way the '386 extended from 16 to 32 bits. (This is from AMD's data). The '386 also added a couple more instructions (and registers) to the '286 design. That doesn't make an entirely different instruction set and/or decode.
Re:The Hammer is NOT a good thing... (Score:2)
Actually, it's primarily because Intel pushed better fab processes into production earlier than the RISC crowd, of whom only Motorola & IBM fab their own.
The Alpha was making a run for this crown, and it was the only horse in this race for the longest time, and then all of a sudden from out of nowhere both Intel and AMD both overhauled the Alpha as if it wasn't there.
Never underestimate the damaging effects of a corporate sale. When DEC was split between Intel and Compaq, (well before the 1 GHz barrier) it was the death knell for the Alpha-- there was simply too much disruption in the shift of companies. (not to mention the fact that many of Alpha's engineers wanted nothing to do with Intel or Compaq, so they left) Neither AMD or Intel was bought out, as DEC was. And AMD even ended up with some of Alpha's engineers!
That leaves the whole category of heavy-haul trucks unanswered by x86 at the moment. But what distinguishes a heavy-haul truck from a pickup? The ability to pull large loads. Is that all achieved by the truck's engine? No! Large trucks have incredible 18-speed transmissions, and stiff chassis, etc. In other words it's the overall package that distinguishes a heavy-hauler from a pickup... [it] describes a similar approach to how you distinguish a RISC processor-based (heavy haul) server from a PC (pickup) processor-based one.
So how's this got anything to do about Hammer?
Easy... Architecture. As you say, the engine is only a small (but significant) part of the entire package that makes the distinction. The rest is the architecture around which the engine is built. Frankly, even though there's been many improvements of the x86 design (primarily by eliminating ISA and replacing it with PCI/AGP), it still has its problems; which is why it will never be a true replacement for high-end workstations and servers.
Well, what it leads to is that Hammer has been designed right from the start to be everything from a car engine, to a pickup engine, to a heavy haul engine. That's because of its various features, such as Hypertransport, and onboard DRAM controller.
If it were designed from the ground up, it wouldn't be x86 compatible; not, at least, if the designers wanted a truly great processor. Rather, AMD hopes to ride the x86-compatibility market and is therefore adapting a phenomenal RISC core to the pre-existing x86 set. It's like bolting a jet engine on a farm tractor.
Hypertransport (as well as a built-in DRAM controller) is only useful on multiprocessor systems (I'm not downplaying their usefulness at all) The onboard DRAM controller allows each processor to have its own seperate memory (whereas many, including the IA-64, share the same memory through the system bus.) Combined with the increased multiprocessing effecinecy Hypertransport offers, the Hammer processor line seems to be clearly designed for multiprocessor systems. (Hypertransport and onboard DRAM doesn't provide any real benefit to a single-processor system)
It will be great for companies that want to upgrade their x86 server hardware, but want to keep their old software. It'll do great in the 3D animation and rendering studios, many of whom use a Unix-like OS anyway. But for the general desktop machine, there will be only one CPU, robbing the user of the benefits Hypertransport and the onboard DRAM module give.
One key here is that Hypertransport is not unique to the Hammer; SUN, HP, Motorola, SGI and Apple are all members of Hypertransport consortium, and intend to incorporate it into their processor designs.
The primary benefit of an onboard DRAM controller per chip (no longer sharing the same memory pool via a bus) is already implemented on other architectures by using multiple DRAM controllers.
My argument all along was that the Hammer isn't a good thing because it:
Keeps the paleolithic x86 architecture.
Could operate far faster if its RISC core didn't adapt itself to x86
We would be better off junking the x86 architecture sooner than later.
The Hammer, while an excellent x86 design, seeks to make the transition 'later', if at all.
Most of the responses I've seen are remarkably similar to a PC fan's reasons why they don't want to switch to a better machine than x86 can provide: They're cheap (the machines, although it can apply to a few users). Actual reasons as to the Hammer's 'superiority' are in no way particular to the Hammer, and are found in many of its competitor's drawing boards as well.
And outside the Free software world, where the software typicall only requires a recompile, the Hammer faces some serious, possibly fatal obstacles once 64-bit compiled commercial packages begin to replace the older 32-bit code. The commercial reality is that to be successful, the Hammer has to have natively-compiled 64-bit code. (In Windows) To do this, they have to have developers who will support Hammer/64 in addition to the IA-64. They'll have to either sell two different versions (somewhat similar to the sales of Mac vs PC / or Win32 vs x86Linux games), or have both binaries in one package. Both are expensive propositions, and with Intel's virtually guaranteed market-share, it may not be worth the effort to support Hammer.
For a brief history on AMD and binary incompatibility-- Jim Turley, a CPU/Architecture analyst, said the following: "Backing Intel's newest and heavily promoted next-generation architecture is a foregone conclusion for vendors that want to stay in business. Supporting AMD becomes more problematic. Will the added market share be worth the effort? Suddenly AMD finds itself in the same boat as Apple with a different, yet competitive, product that requires dedicated software support to survive.
Grimly, AMD itself lived through this tragedy not so many years ago, and the wound was self-inflicted. AMD unceremoniously axed its entire 29000 family, one of the most popular RISC processors of the early 1990s, due to the cost of software support. The company decommissioned the second-best-selling RISC in the world because subsidizing the independent software developers was sapping all the profits from 29K chip sales. As "successful" as it was, AMD had to abandon the 29K, the only original CPU architecture it ever created. " (emphasis added)
I'm not saying that the Hammer isn't a good processor.
I'm saying that it's putting a jet engine in a 1940's John Deere tractor. I'm saying the mechanic should dump the tractor, and put a jet engine in an aircraft-- not an ancient, over-extended farm tool. The tractor could still do its job, but it's just such a waste of the engine's potential.
I'm sorry, but the x86 instruction set is old and inefficient; it doesn't allow compilers or programmers to access a modern CPU's (including the Hammer) features-- So the Hammer has to deal with the limits inherited from the x86 set.
IA-64 allows explicit branch/pipeline ordering and load optimization; this allows the compiler's larger view to create code that keeps all the pipelines busy.
As all branch/pipeline and load optimization is done in the compiler, there is much more time to find the most optimal instruction order and path. (Fractions of nanoseconds vs. seconds/minutes/hours)
An instruction set (such as IA-64) capable of direct access to branch ordering, or a greater number of registers is more powerful, in that it allows for developers (directly, or via a compiler) to 'take the time' and resources to find the most optimal/efficient way to use the processor's full capabilities.
x86/Hammer does not allow explicit branch/pipeline ordering or load optimization, as x86 was purely single-pipeline until the first Pentium. (Although technically x87 is another pipeline, it served an entirely different purpose... the branching I speak of is of two or more identical pipelines)
As a result, the (Pentium, Athlon, K6, Hammer) must look at its instruction cache, and from that (very limited) amount of information, attempt to optimize the branch/pipelines and provide load-balancing. Time is extremely limited (to fractions of nanoseconds), as are resources to perform any re-ordering. But as time is limited, it frequently executes a suboptimal route and/or order.
Even though the Hammer has all kinds of ultra-modern features and resources, nearly all of them are inaccessible to the programmer/compiler; while the built-in management of these features/resources is quite good, it is also far from perfect (having a far more limited scope than a compiler does, after all) Cycles that could have been put to good use end up being wasted.
Lastly, I'll say that I'm not so much a fan of the IA-64 as I am of the VLIW concept; Non-VLIW processors (Sparc, Power, Alpha) have the same pipeline scheduling concerns as the Hammer. But at least they offer greater access to the processor's resources (such as double or more the accessible GP registers of 64-bit Hammer).
Re:The Hammer is NOT a good thing... (Score:2)
Interesting side note: One reason the Alpha does so well is that the physical design is very closely tuned to its fab process.
And a question: Do you mean a greater number of pipelines, or more pipeline stages?
I ask because more pipeline stages doesn't really increase speed very much (ie. there can be one instruction in each pipeline stage, but as each instruction takes one clock to move to the next stage, there isn't any improvement in speed.) In fact, shorter pipelines are often faster, as they don't have as much potential for stage bubbles.
A stage conflict is when, for example, you have a 5 stage pipeline. Instruction A comes immediately before B. However, instruction B requires that A finish the entire pipeline before it can begin executing. So, instruction B has to wait 4 more cycles before it can execute (instruction A must finish, which essentially clears out the pipeline) A 10-stage would take 10 cycles to clear out before B can execute.
Out-of-order execution can help keep the pipeline busy with other tasks while B is waiting to be executed; but it doens't always work out.
Additional pipelines (which is what I think you meant) is adding a second (or third, fourth...) identical pipeline, so that tasks unrelated to the A,B instructions (above) can be executed as well. Again, out-of-order execution helps keep things busy, but not always.
Which comes to the nice thing about VLIW design: The compiler (or, in the case of VLIW, the maschocistic asm coder) is able to take a larger look at program than is possible in a non-VLIW design (Which, AFAIK for the mass-produced chips, is everything except the Crusoe and Itanium). And that results in a more efficient run than having the hardware attempt to do it.
Of course, as far as design complexity goes, I'm not entirely sure which is easier to design: The out-of-order predicion chip, or a VLIW chip. I tend to believe the VLIW chip is more complex in design.
Re:The Hammer is NOT a good thing... (Score:2)
This argument seems to be more a Rambus vs. DDR thing; and even then on commodity boxen. But I digress. In both cases there is currently an off-chip memory controller. The big reason for the difference in latency is not the controller itself, but the (completely different) methods of transferring data. Rambus uses a serial data transfer, which is easy to scale up (in terms of speed and bandwidth), but has higher latency. DDR is an older, parallell technology. DDR has lower latency, but has lower bandwidth and is much harder to scale up. This primarily because of electromagnetic crosstalk (and other E&M interference problems) within DDR's (parallell) data paths.
There is a point of limited returns with the low latencies DDR offers; the point is frequently reached on high-performance computers (workstations, scientific processing, and high-end servers) where the bandwidth is the key factor. When you're transferring a few GB of memory, who cares that it takes a few us longer to start receieving data-- overall, the entire transfer (from request to completion) takes much less time. Even Wintel boxen are beginning to reach this point.
Personally, I wonder how RAMBUS even got a patent. I don't see how a serial memory bus is 'non-obvious to the trade's practitioners'. But, that's the USPTO for you.
Another major problem is the physical distance to (as well as speed of) DRAM. Silicon technology has already reached the point where a signal often travels faster through logic gates (such as an off-CPU controller) than it does through wire. So long as the memory controller is physically located between the DRAM and the CPU, there is little chance there will be any performance drop. At current CPU speeds, it takes 2-3 clock cycles for any signal to even reach the DRAM (even light-speed is slow at 1 GHz). Then it takes several more before the DRAM addresses and returns data. Then another 2-3 clock cycles before it gets back to the CPU. An off-CPU DRAM controller may or may not take an additional cycle. For large (sequential addressed) memory transfers, this one cycle is a one-shot deal. Even with millions of tiny, single-byte (randomly selected) transfers, there is one million extra clock cycles 'burned up'. This would result in a performance drop of 0.05% on a 2GHz CPU. (And less as speeds increase)
As for Hypertransport, the idea behind that is not just absolute performance increases, but also design flexibility. So the same chipset that serves as a PC chipset, may also be able to serve as an 8-way server chipset, with few design changes (perhaps by adding or subtracting a few more HTT channels).
This is true; but as I said, it only really makes things better for the multiprocessing crowd. Chip makers don't usually pass the costs of a higher-complexity/performance chip to the buyers of a lower-complexity chip. The SP chipset would be the hands-down highest-volume seller. An MP chipset that is based from the SP design would cost less than a wholly-redesigned MP chipset. This suits the MP buyers fine... but it doesn't give any benefit to the SP buyers. The benefit is to MP alone.
Even within a desktop environment, you can easily separate out shared PCI/AGP buses, into multiple switched PCI/AGP buses with Hypertransport underlying them.
You can, but why? For all intents and purposes, the PCI/AGP bus is essentially idle 100% of the time. (The times when it is used is more of a statistical anomoly than fact; a figment of the deranged observer's imagination.) Even in applications when there actually is heavy bus activity, the PCI/AGP bus is far from being saturated. There are cases (such as multiport gigabit ethernet cards) where any single PCI slot is unable to handle the load -- but the PCI bus itself still has massive amounts of idle bandwidth; it's just that it's not possible to transfer the data between the network card and the PCI bus fast enough. (Which is a limitation of PCI's component interface, but not of its bus).
I've seen many servers that have multiple network interfaces, where each NIC saturates the PCI card slot. The actual PCI bus, however, is not saturated, and handles the full load of multiple saturated interfaces quite well.
In other words, it doesn't matter how wide the freeway is; the tollbooth (AKA the PCI Slot interface) is the bottleneck, and is the real limiter of performance. A HyperTransport-switched PCI bus would be like adding more lanes to a highway that has nearly no traffic on it. It doesn't change how fast you can drive. It's the long wait at the toll-booth at the on and off-ramps that is the speed problem.
Espescially as on many motherboards, AGP and PCI are on entirely different buses, so heavy AGP usage (such as DoomIII, or 3D Animation) doesn't even effect the PCI bus. For the desktop user, there is no benefit to such a scheme. Even a power-hungry gamer, using his AGP8X card to its fullest potential, compiling XFree86, and hosting multiple P2P file transfers couldn't do much to dent the PCI bus's capabilities. It's other x86 problems that are most likely to cause speed drops; not PCI or AGP.
Only in ultra-high-end applications would there be a benefit.
But it's not all of the other players it has to worry about, just one player: Intel. Intel may be allowed to use the HTT, but its absolutely certain they would rather die than use their great competitor's designs.
That's completely untrue. In several aspects. First, the NIH (Not Invented Here) syndrome has burned just about everybody. No company that is too proud to use a technology that was NIH lasts long. The managers at Intel are not that stupid. But they aren't going to jump on the bandwagon and spend any money just yet; they'll wait until they see how the results fare on the market before they invest anything in HyperTransport. If it's in Intel's best interest, they'll use it. If not, they'll design an alternative. To call AMD their 'great competitor' is rather short-sighted as well. They're only the most major competitor in the x86 arena, and one with a minority of the market. That's the reality, whether you like it or not. And I like (and have recently bought) AMD processors.
All of the other players are small-fry in terms of volume compared to the x86 camp.
That is an entirely baseless statement. The x86 camp is extremely small in terms of the 'other players'. Or weren't you aware that approximately 0% of all computers use an x86 chip? AMD has a very small production volume; so small they don't even fab their own chips. The only major competitor that is fab'd in such small volumes is SPARC. But Power & PowerPC, Itanium, and even ARM processors are all fab'd in greater volumes than AMD's. Intel plans on abandoning x86 entirely; their Yamhill (Hammer-like) processor is a contingency plan, to 'steal the Hammer's thunder.'
HP has no need to use HTT in its processors, simply because it has no processors anymore
Patently false. HP's processor is the Itanium. (more below)
all of them (PA-RISC and Alpha) have been EOL'ed according their own roadmaps, so what are they going to use them for, Itanium?
Their roadmap EOL's the PA-RISC, but points straight to Itanium. The Itanium is 100% PA-RISC compatible (in addition to supporting x86 and its own architecture). It is the next-gen PA-RISC. They are only supporting the next couple of releases of PA-RISC to appease people whom already have PA-RISC hardware, and wish to upgrade the processors in their pre-existing hardware. Alpha was acquired well after the Itanium was complete; a white elephant of sorts. It was never part of the plan. It's entirely likely that HP will include Alpha technologies into next-gen IA-64 chips. If there is customer demand (espescially if it's from Itanium's co-designers at HP), HyperTransport will be included as well.
Anyways, the only RISC player that is likely to use HTT is Sun, and they will likely use it in their upcoming Opteron servers. It's likely that IBM, HP, in addition to Sun all have Opteron plans secretly already devised.
Opteron is the Hammer's new brand-name, and Sun will definately not be using it.
Sun is 100% SPARC, has been for more than a decade, and they have no plans to abandon it. There is no such thing as an 'Opteron server' from Sun. Sun only sells SPARC boxen.
I already covered HP -- they're Itanium. Their roadmaps still point to it.
SGI's roadmap leads to Itanium for their workstations and servers. They will use Intel's answer to HyperTransport (whether it is HyperTransport or not)
IBM is all about their own Power and PowerPC processors, which has better SPECint and SPECfp scores than anything else to begin with.
It's likely that IBM has an Opteron-based PC and Windows.net server, but the Opteron won't be used in their high-end servers or workstations. IBM already scales well past the point where HyperTransport would be beneficial; and IBM is in the same boat as Intel: If it's worth their while, they'll either use or design an alternative for HyperTransport. But for IBM, it may be completely unnecessary to begin with.
Apple is likely to use HyperTransport, as they have a great deal of flexibility in what technologies are to be used in their machines. Apple is also a member of the HyperTransport consortium. Apple's market is definatley not a trivial one.
Which goes to show my point: Just because AMD's Opteron has great features, they are in no way unique to the Opteron. And its competitors have a better system architecture than x86 to boot.
Re:The Hammer is NOT a good thing... (Score:1)
From reading the AMD manuals, it looks like somebody wrote up a list of what sucks about x86 from an OS perspective and the design engineers did a damn good job at getting rid of just about everything on the list. There is still some nastiness, but it is a damn sight better than plain x86.
The x86-64 application view is dramatically cleaned up too.
And about damn time!
Re:The Hammer is NOT a good thing... (Score:1)
Those segments could have been put to some extremely good use in Protected Mode. They basically allowed you to have completely separate code and data segments which never overwrote each other. Allowing some extremely unprecendented levels of memory protection for applications, not only from other apps but from themselves. It also would make the task of writing OSes easier because the hardware itself could be employed to enforce protection.
In fact, the original Linux was written this way. Linus's original intention for Linux was to design an operating system to see how much of the Intel 386 architecture's features could be used. Obviously considering how quickly he got the kernel designed and running, the Intel architecture made his life very easy. This was in the pre-1.0 days, as of 1.0 and later they shifted to a more generic kernel that could be ported across platforms. But those early pre-1.0 kernels were extremely small and fast.
But I think any 64-bit OS can switch easily between "long" and "legacy" mode, right? So if there is a requirement to use VM86 mode, they can still do so by putting it into a legacy segment?
Too little to late (Score:3, Informative)
Re:Too little to late (Score:1, Interesting)
Re:Too little to late (Score:1)
Viva la PA-RISC!
Re:Too little to late (Score:1)
Re:Too little to late (Score:1)
Re:Too little to late (Score:3, Informative)
You're forgeting Samsung (Score:2)
I remeber when (Score:1)
Re:I remeber when (Score:2, Interesting)
I remember becayse I almost bought one with the special version of NT for the Alpha. They only cost a small amount more and ran like scalded dogs.
The only problem was that there was very little peripheral support and huge driver issues. But most NT stuff ran on them and ran real fast.
AMD is the bastard child of the Alpha.
Puto
Re:I remeber when (Score:1)
The 21064A was released in October 1993 at 275MHz, according to Bhandarkar, and went into the 3000/900 and 7000/700 systems in mid 1994. Dec didn't reach 300MHz until the release of the 21164 in Sept 1994, which reached systems in 1995. 500+MHz Alphas, such as the one I'm sitting at currently (which has not been booted into NT since about a week after I bought it, thanks to RedHat and more recently Debain), only came later.
The original Pentiums (60/66MHz) were released in 1993. They were at P6 by 1995.
So, timewise,
Intel 60MHz DEC 200MHz
Intel 166MHZ DEC 300MHz
Phil
No relevance since HP admitted it will kill it (Score:5, Informative)
http://www.hp.com/hpinfo/newsroom/press/07may02
They are dropping Alpha and PA-RISC for Itanium... baaadddd move!!
Re:No relevance since HP admitted it will kill it (Score:3, Insightful)
AlphaServer systems will be focused on the Alpha installed base. - from the press release sited above.
But this also means that of the existing customers, probably only those who can't find another alternative soon will buy the new Alpha. Seems like kind of a harsh thing to do the Alpha. If they (Compaq) released this chip then said that they were stopping the line, that would be one thing, but in this case, they're stopping the line before releasing the chip! This is certainly a bizarre move.
Re:No relevance since HP admitted it will kill it (Score:4, Informative)
Furthermore, they've got customers on Tru64 and VMS who have nowhere to move at the moment, but may need more grunt; they'll buy upgrades until they've ported VMS to Itanic and the Tru64 customers have migrated to HP-UX (or give up on the Digital->Compaq->HP fiasco in disgust and move to AIX or Solaris).
Bear in mind that until fairly recently Digital/Compaq were selling new VAX systems to customers who had VAX/VMS setups that worked just fine and no particular desire to upgrade.
Re:No relevance since HP admitted it will kill it (Score:2)
One thing you seem to have omitted. The Itanium is a joint HP-Intel processor. HP was intimately involved with the design of the Itanium, and intended it as a replacement for the PA-RISC from the beginning of the design. HP had better 'know-how' in 64-bit RISC, and Intel had the fab facilities to produce the Itanium on a large (and more inexpensive) scale. The Itanium was & has not ever been intended to be a x86 competitor. It was designed to replace the PA-RISC and to compete with MIPS & SPARC, among others. In fact, originally, the Itanium was supposed to be backwards/binary-compatible with the PA-RISC. (I'm not sure of if the final product actually IS, but I lost interest in the Itanium several years ago...)
It was merely hoped that one day the architecture the Itanium uses would finally replace the x86 architecture.
The Alpha, I suspect, is somewhat of a white elephant in HP's acquisition of Compaq, and I suspect we can expect to see many of the Alpha's technologies rolled into next-generation Itaniums.
The last Alpha? (Score:4, Funny)
*mniam* (Score:1)
Up to 256 GB of ECC memory
Over 51 GB/s aggregate internal bandwidth
4 MB or 8 MB ECC memory onboard cache per CPU
Up to 224 PCI slots on 64 PCI buses
(the image in linked news announcement has this page [compaq.com] (www.compaq.com/alphaserver/index.html) link).
Re:*mniam* (Score:1)
I put in those 224 PCI slots?
alphas and optimisation (Score:5, Interesting)
I was writing code for a simple matrix transform using the algorithm as follows
for (a=0;a100;a++){for(b=0;b100;b++){
txarray[a][b]=
using the alpha libraries to do the transform instead rated me a 10x boost in speed.
this was weird as i didnt see how the above algorithm could be optimized...tearing apart the assembly i saw
for (a1=0;a1100;a1=a1+10){for b1..{for(a=0;a10;a++){for(b...
evidently they had optimised it so that reads and writes would occur from closely spaced regions of memory and less time would be spent writing.
result ? a 10x boost on a simple algorithm and a neat hack at the same time.
just an example of how awesome the engineering of the alpha wa
Re:alphas and optimisation (Score:2, Insightful)
Re:alphas and optimisation (Score:5, Informative)
This principle can be seen in how the GIMP stores image data in tiles data for rapid processing, in matrix math libraries, in the design of FFTW (The Fastest Fourier Transform in the West, www.fftw.org), and many other systems.
Re:alphas and optimisation (Score:2)
tiling, not loop unrolling (Score:1, Informative)
Re:tiling, not loop unrolling (Score:2)
nice 64bit (Score:2)
I bet that it could cane IA64 in the specInt but the real test would be floating point and to do IEEE754 properly you need 64 bit otherwise you end up emulating it
now we have of the true 64 bit microproessor's
Sun Microsystems - Processors [sun.com] which are a Sparc [sparc.org]
PA-risc which is MIPS like
and MIPS64 [mips.com] which I like alot
of the ports linux to 64bit for linux HPPA [debian.org] and the oldie but goodie linux Alpha [debian.org] and linux sprac64 [debian.org] of course not forgeting linux for IA64 [debian.org] but unfortunately the linux for MIPS [debian.org] is not 64bit so if ever their was a challenge as linux is mostly 64bit clean its to do a MIPS64 port
oh and intel wont like to say linux for hammer [x86-64.org] which is not real 64bit just has some 64bit registers tacked on (but hey you can do fp right
Re:nice 64bit (Score:1, Interesting)
The really interesting thing is the parallels between IA64 and Alpha, that they both sucked hard in thier early varients. It is a little known fact that Alpha 21064 cost more than all other 64bit CPU's at the time and performed far worse.
Oh I should finish off by mentioning that the current IA64 tools for linux suck donkey balls in the performance stakes hopefully this is one area where serious optimisation will be made.
Re:nice 64bit (Score:1)
Show us the figures, or shut up.
The 21064, shipping in 1992 was level (Int & FP) with the PA-7150 from 1994, level to PPCs in SPECInt92 withthe 604 from 1995, but thrashed the 604 in FP.
i.e. the 21064 was _2 years_ ahead of the field.
The 21064A, shipping in 1994 was superior in Int and FP to every processor by _every_ other RISC manufacturer before 1996 apart from MIPS's R8000 from the same year, which was better at FP, but lousy at Int.
i.e. the 21064A was _nearly 2 years_ ahead of nearly the whole field.
My figures from MPR, from vendor SPEC releases.
Sure, they weren't cheap, but neither were Sparcs, or PAs.
The ding-dong battle between HP and DEC started after that, and basically DEC would spend 75% of the time top of the SPEC FP tables, but HP would always manage to throw a system into the #1 slot for about 25% of the time. Intel/AMD coming anywhere near either of those, and Power now actually beating it, are relatively new concepts considering the 10-year length of Alpha history.
FatPhil
Re:nice 64bit (Score:5, Interesting)
x86 processors have had 64-bit floating point registers (actually 80-bit) for as long as they have done native floating point. x86 does not have 64-bit integer registers; this has nothing to do with floating point.
The reason x86 has traditionally sucked at floating point is because the x87 floating point ISA only allows for a stack of 8 fp registers, instead of a flat set of 32 registers like most RISC architectures. This has been worked around to some degree in current x86 processors through the use of a flat virtual register set and good compilers, although there is only so much a compiler can do when it is limited to 8 target registers. Nowadays the continued leadership in SPECfp by 64-bit RISC chips is mostly due to higher memory bandwidth and particularly large L2/L3 caches which help a great deal with certain SPECfp subtests.
While not quite as high as its world-beating SPECint scores, the P4's SPECfp scores are still damn good, and would be even better if Intel would officially support PC1066 RDRAM (the current scores on spec.org are PC800 only). Put another way, they will be even better when Intel releases their dual-channel DDR chipset in a few months.
That said, EV7 will clearly have the SPECfp score to beat for quite some time. (Probably SPECint as well.) And Itanium2's SPECfp scores are reported to vault it well ahead of the also impressive Power4. But, again, this is all to do with higher DRAM bandwidth and larger caches, not with any inherent limitations of x86 for performing double-precision fp.
Re:nice 64bit (Score:2)
* 8088?
Re:nice 64bit (Score:3, Insightful)
No doubt. If it wasn't clear from my post: the fact that AMD and Intel can get almost equivalent single-CPU SPEC performance (and SPEC is oriented toward workstation/server/HPC workloads!) to the top 64-bit CPUs, despite maintaining backwards compatability with a much uglier ISA and costing ~50x less, is a huge credit to their engineering teams. As well as pretty strong proof that the fitness of your ISA is much less important than the manufacturing process you use and the engineering resources you have.
And second, while it of course no longer matters than the P4/Athlon are backwards compatible with the 8086, it mattered hugely that the 286 was, and that the 386 was compatible with the 286, and so on. Tremendously. The immense size of the x86 backwards compatible market has meant that Intel and AMD sell their CPUs in volumes large enough to make owning their own fabs (and keeping them on the cutting edge of process tech) worthwhile...which in turn is what has kept x86 performance so competitive (along with other effects from selling into such a huge market).
Re:nice 64bit (Score:1)
It's not the hardware that is holding back IEEE754 properly, but rather the compilers
According to W. Kahan (one of the fathers of IEEE754) (see this link to PDF article [berkeley.edu])
"The widest precision thatâ(TM)s not too slow on odayâ(TM)s most nearly ubiquitous âoeWintelâ computers is not double (8 bytes wide, 53 sig. bits) but IEEE 754 double extended or long double (Â10 bytes wide, 64 sig. bits). This is the format in which all local scalar variables should be declared, in which all anonymous variables should be evaluated by default. C99 would permit this (not require it, alas), but â¦Microsoftâ(TM)s compilers for Windows NT, 2000, ⦠disable that format.
Java disallows it.
Most ANSI C, C++ and Fortran compilers spurn it.
( Appleâ(TM)s SANE got it right for 680x0-based Macs, but lost it upon switching to Power-Macs.)"
Re:nice 64bit (Score:1)
Erm:
PA has a segmented memory architecture, MIPS doesn't. (OK they call it an 'address space identifier', but really it's a segment, as the virtual addressing modes are all 32-bit, unlike MIPS' 64bit.)
PA has packed decimal types, MIPS doesn't.
PA has variable bit-field data types, MIPS doesn't.
PA has 58 SP registers which can be paired for DP, MIPS has a flat DP FP register set.
Both have branch delay slot but PA has optional nullification, MIPS doesn't.
PA doesn't have a branch-likely extension, MIPS does.
PA has conditional moves, MIPS doesn't.
PA doesn't have a dingle instruction divide, MIPS does.
MIPS-III may have filled in some of the above gaps, but I stopped looking at MIPS at the MIPS-II stage.
PA was a 'braniac' design (lots per tick), MIPS was a 'speed demon' traditional RISC design (lots of ticks).
FatPhil
Such a shame.. (Score:1)
Works as a nice door stop too!
Re:Such a shame.. (Score:2)
Anyway, I have always loved the Alpha and wanted one since I was a boy. But after having one, and finding how poorly they are supported these days, I can't wait to get my dual Opteron system.
TestDrive (Score:2, Informative)
cpu Alpha
cpu model EV7
system variation Marvel/EV7
cycle frequency 800000000
BogoMIPS 2140.20
platform string Compaq AlphaServer ES80 7/800
cpus detected 2
cpus active 2
This has been restructured a bit to pass through the junk filter as well as condense it to the most important info.
0.18? (Score:1, Insightful)
0.13 schould be capable of such a chip.
IBM uses 0.13 already for their power4.
One of the foundries like UMC or TSMC would be proude to produce the alpha.
RDRAM? Bye-bye! (Score:2, Funny)
Missing feature (Score:5, Funny)
They should go all the way and integrate either one of these into the packaging:
Suddenly, Athlons seem mighty cool (literally).
Re:Missing feature (Score:2)
Re:Missing feature (Score:1)
GOD BLESS THE ALPHA! (Score:1, Interesting)
Once again, capitalism destroys superior technology, as the DECHPaq behemoth kills off all its own engineering masterpieces to appease Intel.
Long live Alpha, long live Alpha/x86 binary translation technology, long live PA-RISC, long live HP instrumentation and calculators. Long live control by the competent, rather than the short term profit-minded!
May Fiorina and Cappellas be given the softest of pillows to relieve their nightmares of guilt.
4th July? Why are we celebrating independence from a nation now no less free than our own?
About time. (Score:1)
News for Nerds. Stuff that matters.
Thankyou chrisd, now please cover the stories on the Warcraft3 Linux porting effort.
Re:Will it run... (Score:1)
Rumour has it that M$ ported NT to alpha to head off DEC from complaining about it ripping off VMS for NT.
Only if they support AlphaBIOS Re:Will it run... (Score:1)
Peter
Re:Will it run... (Score:3, Informative)
In reality NT does have some VMS like feataures in the kernel, but it is *not* VMS. If it was it would be a little slower and a BSOD would be strictly mythological.
Re:Will it run... (Score:2)
Digital did start a project to get VMS onto other archiotctures, namely MIPS and INTEL but they gave up even before the feasibility study was fully completed).
Re:Can you imagine a beowulf cluster of those? (Score:3, Funny)
Re:Can you imagine a beowulf cluster of those? (Score:2)
Hey, no need for distributed file systems, expensive high-speed ethernet, etc. It's just too bad Alpha never caught on.
And for all those that think it's dead, there are still other companies with vested interest in the Alpha.
Re:Can you imagine a beowulf cluster of those? (Score:2)
Re:Can you imagine a beowulf cluster of those? (Score:1)
Can you imagine a beowulf cluster of those? No (Score:1)
Re:Can you imagine a beowulf cluster of those? (Score:2)
Re:Can you imagine a beowulf cluster of those? (Score:2)
Our Alphas don't calculate much, they just run the biggest electronic futures and options market in the world (at least the production cluster does). Most of the backend code is even written in COBOL.
Re:Can you imagine a beowulf cluster of those? (Score:1)
Wanna donate some time to a distributed computing project?
(I'm one of the few prime-number nuts (Ernst Mayer being the other) who codes stuff for Alpha.)
FatPhil
Re:Can you imagine a beowulf cluster of those? (Score:2)
They are also a *long* way in connectivity terms away from the Internet as all trading by the members goes via a private WAN (better control of transaction times).
Re:Thoughts on the world (Score:2)
Re:Too bad: littlw Linux support. (Score:2)
Re:That's what happened (Score:1)