Troubles with Merced 90
Brandon Bell
writes "Everyone has their theory on why Intel's Merced
is in trouble. Kraemer just wrote an opinion piece that
discusses two problems he thinks its facing: the compiler
and the sales model."
In order to get a loan you must first prove you don't need it.
hold off on merced comments just yet. (Score:1)
EPIC != SMP (Score:1)
To take advantage of EPIC, a compiler needs to look for machine instructions that have no dependency on each other. Those instructions can be executed simultaneously.
To benefit from multiprocesing a program must use threads (break itself up into what are called lightweight processes). Threads are an operating system service, not really a processor level thing. A compiler cannot make a thread - even on Merced. The programmer creates the threads by triggering operating system calls within the process.
Thus, even on Merced, an ordinary Joe Blow can't just make individual programs faster just by popping in new processors unless the software is written using threads. And even then, the benefit will only affect software on operating systems that offer both user and kernel level threads like BeOS and Solaris UNIX. NT doesn't really cut the mustard. And Linux doesn't either.
As far as writing the compiler goes, he's partially correct there, but only partially. All programs have a few instructions that can be executed simultaneoudly safely, but how mush faster would that make the program. A compiler must be well written and this will make quality comparisons between different vendors' compilers much more useful. How crappy will the MS Visual C++ be then? Will NT even run on Merced?
Branch prediction (Score:1)
You got the Alpha and PA-RISC mixed up: the Alpha always assumes that a branch to a previous address will be taken (which makes loops fast and gives compilers a handle on how to optimize code with well-understood flow).The main advantage to using such a simple (and more or less effective) technique on the alpha was that it consumed very few transistors and the Alpha was facing very severe space constraints.The 21264 is about twice as powerful as the 21164 at the same clock speed, and most of the benefit came from improvements in the branch prediction (made possible by better fab technology relieving some of the space constraints; the Alpha is a *big* processor).
Branch prediction is very important for keeping deep pipelines from stalling.If your pipe is 33 instructions deep and your branch prediction is only 90% effective, then your branches cost you an average of three extra cycles each.
Speculative execution is another powerful tool for keeping your (now tree-shaped) pipeline full, but it's not intended to be a complete replacement for branch prediction.On systems where the pipe isn't trivially shallow speculative execution is used with branch prediction (ie, executing speculatively on the earliest branches and/or poorly predictable branches and using branch prediction for the rest).Speculative execution is expensive in terms of duplicating issue logic and ALU, but that's not much of an issue for today's microprocessors -- most of the space on-die is taken up by memory cache, which tends to become much less effective per-transistor once beyond a certain (already long surpassed) size.As long as the extra logic for speculative execution yields better gain per transistor spent than L1 cache, it's a win.
Yet another technology for ducking the high cost of conditional branches is predication.Predication is orthogonal to prediction and speculative execution.Its biggest strength is that it doesn't require much extra logic, doesn't require splaying your pipe into a tree (a la speculative execution), and greatly reduces the cost of small blocks of conditionally executed code albeit not as much as good branch prediction would, so its use is more or less limited to small blocks of hard-to-predict conditionally executed code, and having it in your processor by no means allows you to get away with not using good branch prediction logic.
-- Guges --
The man is clueless; EPIC = VLIW (Score:2)
1. The compiler has to explicitly package instructions that have no dependences into parallel issue packets. This task is currently
done by hardware in superscalars.
This means that you can't just slap another CPU onto the board to make things faster:
the parallelism is in the instruction level and is compile-time determined; most bindings are done
statically. For this to work right,
almost all instruction, pipeline and functional units characteristics have to be exposed. For example, for software pipelining (a technique to
overlap instructions from different iterations of
a loop) to work, pipeline latencies, and functional unit resource constraints at each time step has to be carefully considered.
2. EPIC is basically VLIW, but don't say that because all VLIWs (except DSPs like TI's C6) have been commercial failures. Besides it's not as salesirific as VeeEllEyeDoubleU.
3. The compiler is crucial. So far, only Univ of Illinois IMPACT group and HP CAR group really really knows how to build one well. (My opinion
of course)
4. For more technical info, read comp.arch,
look at proceedings such as MICRO. See also www.trimaran.org. Just don't listen to the clueless.
Allen (leunga@cs.nyu.edu)
Very little technical content. (Score:3)
I didn't knew that Carnack was the semi-god of compilation research, and I thought that some people had OSes running on Merced simulators.
The sales model: of course Intel made a gamble ; but the gamble is that you couldn't make much architectural optimization on current RISC (maybe that's why people are throwing away die area with multiple MMX, 3DNow!, whatever "multimedia/SIMD" units), so a new paradigm shift could outperform older units. Basically, Intel is betting that superior performance is a valid reason for being (partly) incompatible with x86 (or to keep up with competitors).
The a huge part of Merced issue is essentially technical, and the article is just completly out in this respect. Please, someone fix this and post relevant URLs...
Buzzwords make it easy to spot the idiots (Score:4)
Even from the small amount of information published about IA64, it is clear that there is absolutely no support for automatic scaling simply by adding cpus. EPIC refers to the way each individual cpu decodes the instruction stream. EPIC is no more inherently multi-processor than the current IA32 instruction set.
To get automatic scaling, you need something like Tera's Multi-Threaded Architecture. Too bad they can't seem to ship the damn thing, and that it costs a couple of million.
See: http://www.tera.com/ [tera.com] for more info.
Time for change? (Score:2)
If the trouble is in that the money model does not work with easier to upgrade hardware, then maybe the model needs to change. Currently Dell and Compaq make money selling whole computers. Perhaps they should sell or lease parts in addition to cases. That way you could change the CPU every few months and keep current, for a fee of course.
Time and progress won't hold still, so perhaps you shouldn't.
-Ben
It's not just _a_ compiler (Score:2)
In addition, it's widely accepted that there will be faster RISC CPUs available then Merced when Merced ships, and even faster x86 chips (at running x86 binaries). Before using this to claim that Merced is dead, remember that this was true of the initial RISC chips when they came out. What this means is that it'll simply take a while for EPIC to mature and for the advantages to come to the fore.
The problem is that Intel doesn't have much of a choice. Well, I suppose they could have gone for a standard RISC chip. But something post-x86 is necessary. If nothing else, the 32-bit limitations of the architecture are hurting Intel's sales in the lucrative server market (where 10's of gigabytes of RAM are common, and 100's not unknown). The desktop user probably won't care for another 4-5 years, but the server market started caring 3 years ago.
One interesting note in all of this is how this is affecting the Intel/Microsoft relationship. By it's actions, Intel has no confindence in Microsoft being able to ship an Enterprise-ready 64-bit clean OS any time soon. Not that I blame them.
Intel is learning one nice feature about open-source operating systems- they don't have to depend upon someone else to support their chips. For a small engineering investment, they can do it themselves- and if you want something done right, doing it yourself is a real good idea. Making a small investment in a small company (like, say, Redhat) makes a lot of sense in this context.
That being said, I think EPIC is an interesting design with a lot of long-term potiential. Standard RISC processors have a hard time averaging more than about 2 parallel instructions. Research done by HP indicates a lot more than that is possible- it's just computationally infeasible for the _processor_ to find it.
No mention of unix workstation/server vendors (Score:1)
took to research the matter.
I see no mentioning of any of the unix vendors.
Both HP and SGI are going to use the IA64. HP
will be fasing out it's PA-RISC CPU in favour of
the IA64. (Don't know about SGI's use of MIPS
CPU's.) Both vendors have extensive experience
with multi-scalar RISC CPU's. Also, Intel has
it's own RISC CPU's and the are several 3rd party
compiler developers probably just waiting for a
break.
Also, he starts comparing Joe Average's IA64
system with real server machines (HP PA-RISC,
Alpha AXP, MIPS R10K, Sun UltraSparc). An
IA64-based system is going to cost more that Joe
makes in a year! He can't even buy a machine
based on one the the currently popular server
architectures (except maybe Intel IA32/Xeon).
Also, the comment about just adding an extra CPU
is also valid for current SMP Windows NT based
systems, since almost all software is
multi-threaded. (I really like my dual P150
running Linux...)
Mathijs
Ignore that article (Score:1)
Most processors are already parallel in the way EPIC means parallel. It has multiple units of execution which can concurrently process instructions. Multichip parallalism is a much more tough problem with lot's of different problems to beat. It has nothing to do with merced or the sales model.
An IA-64 instruction comes in a bundle with 2 other instructions, all together there are 3 instructions in a bundle. Each instruction is something like 40bits long and each bundle has a dependancy flag of several bits. The performance problems that hinder chips the most are pipeline stalls and branches. The chip has a ton of logic that tries to predict branches and choose the right one and modern chips have a ton of logic to execute instructions out of order to reduce stalls. IA-64 forces the job of stall detection to the compiler, which makes the instruction bundles and chooses the dependancy flag (the flag says which instructions in the bundle and conflict) that way the chip doesn't need as much logic for out of order execution, they can focus on more important things. This is also a piece of cake for modern compilers, IBM, Sun, MIPS and DEC all have the technology to do this and most have for years and years.
To solve the branch problem, IA-64 doesn't use branch prediction. This is the really important part of EPIC. It executes both branches and once the correct path is know it discards the instructions it executed on the wrong branch. THis is tough to do. The compiler is also supposed to help with this and add some bits to the flag and this is a tough thing to do.
If it all works, IA64 chips will be fast, but nothing stellar because RISC chip makers have done such a great job of dealing with these problems already. So Intel has chosen to make a very complicated design, with some hard but not impossible compiler changes and they aren't going to deliver the ultimate performance they have been promising for years. There are definitely hard technical problems to solved but they aren't that bad, I think the bigger problem is actually building a chip that can compete with modern PowerPC and Alpha RISC processors and look like it is innovative. Intel is breaking compatibility and once that is done it's anybody's market because they have nothing that makes them look better than the other guys (like 25 years of x86 software...)
The funny thing about all this epic talk is that intel still has to have logic on the processor to tell if the compiler lied... They were trying to get rid of that logic to make a leaner and meaner processor but they still have to have it.
SMP and Merced (Score:1)
I don't really know for sure, but it seems to me that one of Intel's major problems is that they want to charge an extreme premium for performance, and don't want to wake up in a world where you can scale processor power by adding CPU's.
I find it odd that just when the PPC people will be removing the "SMP Premium" charge from their chips, and making very SMP-capable G4 chips, Be will be abandoning the PPC arena for Intel chips, where the only processors capable of scaling beyond two-way SMP are "non-consumer-grade" very expensive, very high-margin server chips.
Once someone comes out with a decent low cost multi-smp-scalable (beyond 2!) chip and motherboard system, the world will beat a path to their door. I think if that ever happens, Be will have to decide whether or not to stick with Intel and watch some more processor-agnostic SMP-capable OS (like Linux) seize the ground of becoming a "media OS."
Phil Fraering "Humans. Go Fig." - Rita
SMP and Merced (Score:1)
For the intended market for Merced, i.e.
servers, being able to handle multi-threaded
applications would help a lot.
This also holds true for consumer OS's, otherwise
M$ wouldn't be quite so concerned about Be.
Phil Fraering "Humans. Go Fig." - Rita
Re: The man is clueless; EPIC = VLIW (Score:1)
Author ignoring reports (Score:1)
Incidentaly, what I've heard at the Register does correspond pretty well what I've heard though other channels.
Basically, the 'other' problems with the Merced are the design itself - Intel's engineers aren't used to doing this sort of thing, and also, apprantly they're a bit short on quality engineers, and are using lots of people who've just left university. Not the sort of people to give a massively complicated chip design to.
Incidentaly, the '2nd gen' EPIC chip, the McKinley is mostly being done by HP, and is apprantly going pretty well. I've been hearing from my own contacts for a long long time, that the Merced might just end being a 'test' processor that never goes into production, and that the McKinley will be the first production EPIC.
Not surprisingly both HP and SGI have recently been saying they'll still commited to their own architectures (at least for a while), after previously planning to dump them. I think HP have been saying they'll go with their own stuff for another 5 years.
Intel isn't the only one being a bit late. Sun are about a year behind with their UltraSparc-III, though I haven't heard anything about why they're behind. (their reasons for being late is probably quite different to Intels.) Shame, as it seems like a pretty nice chip...
Intel's pricing model doesn't work that way... (Score:1)
Intel's pricing model doesn't work that way. True, with every new chip Intel announces "this is for servers only". And the first few thousand chips do go into servers. But the server market isn't anywhere near large enough to pay back the cost of developing that chip, so within a few months workstations are released, first by one of the larger clone makers, then by Gateway 2000, and finally by Compaq.
And Intel absolutely depends on these workstation sales to drive their learning curve
So look for the first Merced (McKinley?) workstation about three months after the first server is released.
sPh
I hear what you're sayin, but... (Score:1)
I hear what you are saying about home power requirements, and I tend to agree (although SimCity 3000 might just need that Merced
Just my 0.02.
sPh
Compilers *are* hard - Ask MIPS or HP (Score:1)
I remember all the trouble MIPS had when they rolled out the R10000 chip. Initial performance was not up to spec because early estimates of performance were pretty much correct on the SPECfp numbers, but underestimated *how long* it took to get those numbers. It took a couple of years for the compiler people to wring out the best performance (ok clock speed was off too, but that was not the sole reason, nor was internal CPU wars within MIPS/SGI).
Now the Merced is more complex than the R10000 (and at least the R10K has some *vague* similarity to the R8K, and PA architecture has been around for many years, so these companies had compiler writers experienced in some of the problems they were up against). Intel is starting from scratch here. I'd say when they've done first tapeout and have silicon in their hot little hands, it'll be at least a year before the compilers get close to the performance they hope for.
Meanwhile, IA32 will be up to similar spec and Alpha, PA and MIPS (and SPARC perhaps) will be serious contenders.
Just last week or so MIPS and HP announce they were reviving their CPU development for a further year or so (ie another generation), rather than trusting all to Merced (I assume that means last MIPS or PA in 2003-2005 now). What news did Intel give these guys for them to decide to make such an announcement??
cheers
Michael Snoswell
Only half right (Score:4)
But I don't know where he got this idea that Merced automatically makes all applications multi-processor ready; that's just plain wrong. High end processors have had multiple execution units for many years, which allows them a small amount of very fine grain parallelism: on average perhaps two instructions can be executed at once. Sometimes when you're lucky it can be more than two for a short burst. Merced will *not* be able to keep all 7 of their execution units busy 100% of the time, but they may get lucky and do so for an instant every once in a while, if their compilers are really good.
None of that has anything whatsoever to do with multiple cpu's. The situation with those will be unchanged from the situation today with multiple e.g. Pentiums: applications won't take advantage of more than one cpu unless they are explicitly coded to do so.
Therefore the conclusion of the article is dead wrong: the business model won't change, because he just misunderstood the issue with parallelism.
The man is clueless; EPIC = VLIW (Score:1)
My summery of EPIC vs. VLIW vs. SuperScaler (note I use the term "functional unit" to mean "thing that can execute some kind of instruction", more functional units means a faster CPU, it's an oversimplifaction, but useful in this context):
Now my reply to allen's post:
Yes, however these packages need only be free of data use dependences not executions unit dependencies (this is the big diffrence between EPIC, and traditonal VLIW).
To get maximum proformance this is correct. From a "normal" VLIW you need it to get a working program. This diffrence is important. If you own a Multiflow (one of the defunct comercial VLIWs) and you upgrade it's CPU all of your old programs are incorrect (diffrent load latencies), and if you managed to compile your code to work with both load latencies, you still can't use more adders per cycle because the exact instructions that are executed per cycle are set in the code.
If you upgrade from a Merced with three integer execution units and two load units to one with six integer units and one load unit your old programs continue to work. The may run faster, or slower, but they still work.
I don't think you need to know all the details of the Merced microarcheture to get decent proformance. Just move the loads as far from the uses as you can, and get as many instructions marked intependent of their neibors as possable. You may end up moving loads farther away then needed, or marking more things as "can run in the same cycle" then your Mercend can gobble up, but that's ok. It won't kill you. It might make a furure Merced faster even.
The EPIC is basically a VLIW, except it is a little slower (for a given transistor budget), and it has an upgrade path. I think the upgrade path makes it comercially diffrent from VLIW.
Multiflow made a good one (well, it got good results most of the time, it was pig slow). DEC eventually bought it when Multiflow went under.
Also the compiler isn't as hard as it is for VLIW. With VLIW if you get the latency wrong you don't run. EPIC just stalls. Kind of like SuperScaler. Getting max speed requires tons of work, but the same work would speed up a SuperScaler (by exactly the same amount, if the SuperScaler has the same number of functional units). I think the big diffrence will merely be that EPIC CPUs will tend to have many more functional units so the bad-compiler vs. good-compiler will be more like a factor of 8 then a factor of 4 (or factor of 2 on a PII/PPro/PIII).
Indeed. And that's not a slam, your opnions were well thought out, I just happen to think that requireing explicit dataflow (EPIC) is very diffrent from explicit dataflow AND instruction scheduling (VLIW).
Ignore that article (Score:1)
I assume you mean that it performs speculative execution (which is what you described) in addition to having predicated instructions, e.g. speculatively executing predicated instructions before it knows what's in the instruction's predicate register, and throwing away instructions' results as soon as it finds out that the predicate register was false.
(I.e., predicated instructions aren't the same thing as speculative execution; don't automatically conclude that Merced does speculative execution merely because IA-64, of which Merced is planned to be the first implementation, has predicated instructions.)
It's not just _a_ compiler (Score:1)
...although, of course, one can support more than 4GB of RAM with a 32-bit processor, in the sense of a processor that can't handle more than 32-bit linear virtual addresses, as long as the processor's physical addresses can be more than 32 bits (as is the case with most, if not all, P6-core processors - Pentium Pro, PII, PIII) and as long as the chipset can handle it.
It may be less convenient, as one might have to have a process manually map stuff into and out of its address space if you want a single process to use more than 4GB of RAM (as opposed to, say, having file systems use it as a buffer cache, although that may also involve switching mappings), but it's certainly still possible.
(I say "linear virtual addresses" because, whilst the x86 segmented virtual addresses go up to 48 bits, they first get mapped by the segmentation hardware to a 32-bit linear address before being used as physical addresses, if you haven't enabled paging, or before being run through the page table, if you have enabled paging; not only are 48-bit addresses not necessary for accessing more than 4GB of physical memory, they don't even help you to access it.)
EPIC != SMP (Score:1)
I was under the impression that "auto-parallelizing" compilers can convert, say, some Fortran or C/C++ code into multi-threaded code.
See, for example, this Sun white paper [sun.com] on their compilers, which, it appears, can auto-parallelize loops to run on multiple processors.
EPIC != SMP (Score:1)
In what fashion? Its threads may not be "both user and kernel level" in the sense that there are user-level threads that can be executed by a pool of kernel-level LWPs, with the possibility that there are more user-level threads than kernel-level LWPs, as is the case in Solaris, but I don't see why that's necessary in order to get a speedup to a threaded program by adding processors - would not the model I think NT uses, wherein every thread known to userland is known to the kernel (I ignore "fibers" here), be sufficient?
Forget about it breaking their sales model (Score:1)
DEC comes to mind - handicapping their low-end systems so that they would not outperform the high-end ones.
Bruce
Very little technical content. (Score:1)
silicon process technology they've developed would
wipe that arguement out - the other detail -managing
to get x86 to go as fast as they have generation after
generation disproves the statement also.
Also, EPIC as detailed, isn't really even an HP
invention, but rather an outgrown of things done
by companies in the 80's such as Multiflow and
Cydrome.
The author claims that the compiler is a bitch -well it
is, but they solved most of the problems with the
compiler technology at those earlier companies - and
some of the folks doing the IA64 are graduates of same.
I WOULD worry about the scalability of the architecture
though - that WAS a problem with the Trace and
Cydra architectures. You had to recompile for
suceeding generations of hardware. I personally
don't know if EPIC solves that problem with VLIW
architectures. Anyone know if it does, and how?
Steve
RISC? Whadda ya mean, RISC? (Score:1)
number of the folks that DO have such
experience work for Intel and HP.
I know - I worked with about half of em
at Cydrome..
Re: The man is clueless; EPIC= VLIW (Score:1)
Is JIT reduce to compiling for the hardware the
JIT compiler is running on - or a virtual machine?
If it's a virtual machine, then the Machine simulator
is all that needs to be run thru the EPIC compiler -
cause that's all that would execute - not some
intermediate target language. If that's the case,
EPIC won't present any real problem.
Out of ignorance...which is it??
Steve
MERCED EPIC v DEC ALPHA 21364 (Score:1)
SINGLE thread go faster - you can up the
clock rate - or figure out a way to run more than
one instruction at a time.
Multi-issue pipelines, VLIW, and EPIC are attempts
at solving the problem in the second manner. Once
you have an adequate solution in the second space
it becomes possible to improve it's performance in
via the first method.
Thus, from an architect's point of view - the second
method is the first tried!
Now - which is better - multi-issue pipelines or
EPIC at a given clock rate. That remains to be seen.
Steve
SMP and Merced (Score:1)
SMP doesn't of itself improve the SINGLE process
performance. You CAN write special code on a
Beowolf platform (or SMP) and get the answer faster
for the single thread via parallelism...but that is
a problem that isn't well supported by automatic
tools at this time. We DO have the technology to
throw more execute units at a single thread and
get the answer faster though - that is what Merced
is all about.
You can just as easily SMP a Merced class CPU and
run multiple threads thru them as you can with a PPC
or a Xeon. That isn't the problem that EPIC,VLIW, or
Mulitple-issue pipeline(Superscalar) machines are
trying to solve.
Think SINGLE threads when talking about these
architectures.
Steve
Test Platform (Score:1)
Also, aren't there many people out there who know more about compilers than Carmack?
EPIC has nothing to do with threaded arch's. (Score:1)
EPIC is not a multithreaded architecture. EPIC (which, as everyone knows, stands for Explicitly Parallel Instruction Computing) focuses on Instruction Level Parallelism (known in hardware and compiler circles as ILP for short).
Explicit ILP architectures, such as Very Long Instruction Word (VLIW) architectures, Transport Triggered Architectures (TTA) and the like all focus on finding parallelism within a given single-threaded program. The compiler for such an architecture may divide separate paths of execution into a sort-of thread (for instance, it might execute down the "then" and "else" clauses of an "if" before it knows which it nees, or perhaps down multiple "cases" of a "switch"), but this is not multi-threading in the common, macro sense of the term.
Multithreaded architectures, on the other hand, do focus on running multiple independent threads of execution, typically as if they were multiprocessors. For these CPUs, a given application needs to be constructed as a series of explicit threads (at the process level, not the instruction level), or a compiler needs to simulate this division. Alternately, a number of independent processes need to be available (although since all threads share a common pipeline, running independent processes together can have bad cache effects and cumulative stall effects that generally don't make anyone's day).
--Joe--
EPIC, VLIW, Links for more Info (Score:1)
First, before everyone jumps in and says "Intel will never get there because the compiler will never get there," please don't forget that some shipping devices are already there.
Quite simply, EPIC allows a compiler to tell the hardware ahead of time where it knows parallelism exists, so that the silicon (which is finite) doesn't have to hunt for it. Compared to the rate at which silicon must make scheduling decisions (at 800MHz, that's 1.25 nanoseconds), compiler time seems infinite.
Granted, compiler time is not infinite, but for performance-critical applications, it is quite large. The Texas Instruments [ti.com] TMS320C6000-family of DSPs, for instance, rely on compilers and assembly optimizers in order to eek out that last bit of performance, and as any DSP engineer will likely tell you, its usually worth it. Cycles saved in one loop are cycles that can be spent elsewhere on value-added features, leading to a more valuable product.
This points to the real fundamental problem as I see it, which is that the current VLIW darling in the industry is in the embedded world. Why should that make a difference, you ask? Because the embedded developer is the one most likely to take advantage of the raw capability that an exposed parallelism architecture can provide.
Merced's biggest problem lying ahead is the fact that workstation-class code does not naturally exhibit large amounts of parallelism. While I was attending MICRO-31, I heard someone remark about how most code looks like a series of 5-10 instruction bursts followed by a jump. ICK!!
Embedded programmers generally seem willing to learn whatever it takes to get their product running in the fewest MIPS (so that they can either use cheaper parts or provide more features), and so are often willing to jump through a few hoops to help out the compiler in order to get the parallelism they desire.
Workstation programmers, on the other hand, are interested in the much bigger picture (since their applications are much larger and tend to have larger life expectancies), and so code tends to be human-friendly and not compiler friendly. (Certain heavily-traveled code paths in the Linux kernel being a noteworthy exception.)
The point is that the Merced compiler will ship with alot of amazing compiler transformations, but very few of them will be effective at translating the hopping, skipping, and jumping nature of your typical general-purpose database-ish looking code into highly parallel performance-oozing EPIC instructions, at least straight out of the gate.
Merced will inherently provide big performance wins to the compute-farm customers (your big engineering shops that currently use networks full of Sun or HP workstations to crunch VHDL, Spice, or whatever simulations around the clock), as these applications end up reducing to huge matrix manipulations and numeric crunching galore -- oozing with parallelism. But Merced will be hard pressed to feed up web pages or database queries much faster than any other architecture, unless it's able to massively crank its clock rate due to losing the shackles of the instruction scheduling hardware.
Anyway, those compiler nuts in the crowd might find the following links useful and informative.
--
There may be advantages to current model (Score:1)
This is all assuming that this article is accurate in its description of the OEM business, which I am not 100% convinced of. Among other things, in a business with that kind of growth rate, wouldn't new users be at least as important, if not more so, than returning users?
Branch prediction (Score:1)
At the time we were comparing Pentium, PA-RISC, Alpha, and MIPS.
If I remember, Alpha hadd a huge amount of transistors dedicated to branch prediction. PS-RISK always assumed that the program would loop. As a class, we questioned how much branch prediction actually helped. Does anyone have a good feel, or even some numbers to descibe how much branch prediction improves performance?
This Merced design of executing both branches seems like it would take an enormous amount of work. Is it really worth it? Isn't a simpler design able to operate at a higher clock rate?
And, has anyone read about async processors lately? Anything ever released commercially for that?
Thanks,
Joe
[forgive the English, I don't have an English compiler.]
Intel's pricing model doesn't work that way... (Score:1)
ccNUMA? (Score:1)
Actually, Cray invented all of this, but hey, why nitpick :-P.
SGI's ccNUMA white paper can be found here [sgi.com]
RISC? Whadda ya mean, "RISC"? (Score:2)
That would be all well and good, if we were actually discussing a RISC architecture. But we aren't - we're discussing VLIW.
With the i960 and crew, Intel has all the RISC expertise that they ever wanted (or needed). Finding someone who can write compilers and tools for VLIW is a horse of a different color, however. There isn't much experience in the industry when dealing with VLIW; not only that, coding for RISC isn't going to help you with this type of architecture. Hence, the hair-pulling and delays from the compiler/tools group. This isn't a problem you throw money at to make it go away faster - it's a first run, and everyone on the team is learning as they go.
If that doesn't convince you, keep in mind that Intel is partnered with a company that has deep experience with RISC architectures (HP). If HP and Intel together are having a rough time of it, I would submit that this can't be an easy design to work with - especially given that no one has done it before.
No mention of unix workstation/server vendors (Score:1)
musings on possible problems with Merced (Score:1)
Maybe my view of Merced is colored by the fact that I have used VLIW machines in the past. My experience has been: a lot of code will not run at even close to the theoretical capabilities of the machine (because the compiler couldn't figure out how to squeeze the logic into the parallel instruction set) and there were few compilers and little software available for them.
So far, I see little reason why Merced should be any different. Despite many years of research, compilers that are actually in use still haven't gotten very smart in understanding aspects of programs that need to be understood for parallelization and optimization. And Intel may try to help with C, C++, and Fortran backends, but what about all the other languages that are coming into use? We need chips that encourage the use of post-1970's languages, not chips that write them into stone.
Merced will probably perform well on some very structured problems (geometric transformations, optimizations, other numerical problems, text search, etc.). But for those, adding vector processing units to a more traditional processor might be cheaper and result in better overall performance than Merced's architecture.
There also seem to be questions about the way the VLIW architecture is implemented by Merced; supposedly, code compiled for one generation of the chip will not take advantage of more parallelism available in a later generation.
I think there is a good chance that the Alpha will save Intel. People already know how to write compilers for the Alpha, and the chip is fast. According to an article in Byte (but, hey, where are they now :-), Alpha will have twice the performance of Merced at the time Merced finally gets released.
On the one hand, I'm glad that some company is finally breaking with the dull tradition of processor design over the last 20 years. On the other hand, I'm not sure that this is the right way to do it.
Actually, there is another rather radical change in processor design that has happened recently: the complete system on a chip (from IBM and maybe others). Those might allow very dense multiprocessor systems, leading possibly to very different designs.
The three-wheel bicycle argument (Score:1)
He's right about the compiler being hard, but I'm sure Intel realized this when they decided to go the VLIW route.
Author has it wrong (Score:1)
Intel's pricing model doesn't work that way... (Score:1)
Merced Is NOT For You (Score:3)
RISC? Whadda ya mean, "RISC"? (Score:1)
in '96 (don't laugh) I kept very good tabs on
what was going on with Intel and its competitors
regarding chip technology, with help from friends
placed well at Intel (who would surely like to
remain anonymous). At the time Merced was
described as "essentially RISC" when compared
with the CISC systems then being put out (and
still being put out) by Intel. Over the past
years I kept less abreast of the impending
technologies (having moved my focus to more
software development, and much of that *not* on
Intel systems), but at least kept aware of
scheduled *releases* and some of the current Intel
technology. I clearly missed the IA64 move
(talk about head in the sand) on which I have
justed started to catch up, and hence the "RISC"
discussion above.
The basis of my argument still stands, but the
compilers will be harder to write, and I see now
why there are some delays. Micro$oft does claim
to have a 64-bit windows running on a Merced
simulator (like that isn't a bald-faced lie,
judging by other orthogonal press releases coming
out of Redmond). I still firmly contend that
the current marketing infrastructure for Intel's
products will change if it cannot handle the
responbilities of making money in Intel's Brave
New World. etc., etc., etc.
Thanks for the heads-up.
Roundeye
Don't really see the problem here... (Score:3)
Intel has been plagued for a decade by backwards
compatibility with a poorly designed CISC chip
with one of the poorest memory subsystem designs
still in current use. The amount of juice which
can be squeezed from the '86 lemon is limited and
it is a testament to Intel's determination (some
would say stubbornness or stupidity) that they
have been able to make this architecture a
profitable industry standard (of course the more
cynical (myself included on the occasional lonely
night) might chalk this up as a testament to the
power of a tightly run monopoly).
Merced is a necessity if Intel wants to stay
profitable in the face of not only Moore's Law
but AMD and other not-so-dark horses. This chip
has been designed for the most part for years.
The compilers have been under development for
years as well -- anyone who thinks otherwise
doesn't know how Intel does business.
A company which has the resources to write
compilers for superscalar CISC with pipelining,
data forwarding, bizarre MMX
registers/instructions, virtual '86s while
maintaining backwards compatibility with the
original broken design will find writing a new
compiler for a freshly designed clean RISC
system a wonderful relief. The amount of
openly available published research in the RISC
compiler community is significant, and Intel has
the bucks to hire more gurus on the topic if they
need them.
Marketing... It pains me to see so many people
assume that "they way it is" is "the only way
it can work". This is the same fallacious
thinking that makes it painful to watch any
Hollywood movie about time travel or the contact
of our civilization with another (I think
Indpendence Day may be the flagship example of
this) -- the way we Americans do things in this
day and age is superior to the way any other
conceivable society could do them. Cultural
ignorance and arrogance.
This sort of thinking comes up quite often in
discussions of why "Windows will be here forever"
and now appears here in a discussion of Intel's
marketing plan for Merced. The truth of the
matter is that (1) Intel wants the market to
change -- they have been burdened with the '86
albatross for far too long, and (2) the market
will change. Initially we hardware power users,
systems hackers, and speed/systems freaks will
jump on Merced because it is a better chip than
a crappy CISC chip on steroids. The chipsets
to run the chips will be there, and at least
some variation in motherboard configurations.
Dell/Compaq/Gateway will be able to sell a
Merced system.
If, as Intel puts more of its weight behind Merced
(and more applications are brought to Merced) the
current distribution system cannot change their
marketing model to take advantage of the new
configurations which will be possible and then
*desired*, then someone will step up to make the
new money by providing them. Because it's done
a certain way now doesn't mean that that is the
only way (I reiterate at the risk of sounding
pedantic). This industry moves too fast to coddle
companies which have become too large to steer
effectively.
The distribution channels for these systems, and
multi-processor systems, will develop and may
not include the current Big Players in the market.
In addition, as Intel hopes, if AMD et al cannot
create a chip to compete with Merced, and cannot
anchor the market on the '86-type chips, they
may also find themselves too big to steer out
of the way of the Intel truck.
Be careful. Merced could be a swan song for
Intel, but I think it is more likely their
Excalibur.
What a dope! (Score:1)
2. As other people have pointed out, he doesn't know what he is talking about.
3. If I were looking for analysis on the Merced delays, I would dig around on www.mdronline.com for an excellent article by one of their staff on the subject rather than listen to this bozo. Synopsis of the MDR online article. Intel is leading the design of the Merced and they are using their standard massively parallel design approach (lots of engineers). Problem is, this approach works fine for successive iterations of an existing, well understood ISA & implementation, it is not working well for a brand new, cutting edge ISA and chip. His predection is that the Merced will have a very short life before it is superceeded by the second generation chip in the family, one being designed methodically and inexorably by a small HP led design team.
Extra CPU's (Score:3)
Most users don't need the horsepower of their current K6-350, or PII-300, I'm still using a P-133. How much of their sales are going to be towards companies that can afford a hardware guy, or hard core gamers who have the skill and motivation to do this though? That might be a problem.
That would also be enough motivation to keep on churning out a more advanced, speed and instruction-wise, processor though. Intel has been pretty good at making a new cpu ever 3 years or so, the average time between upgrades. They've been even faster with the improved chipsets. The rest of the computer has gotten better as well.
As long as it's more expensive to build/upgrade to a state of the art system, I don't think the OEM's will have too many problems. Everyone, especially the hardcore gamers, know that the cpu isn't everything. The compiler is another story, but that'll happen too.
SCO UnixWare 7 already running... (Score:1)
Very little technical content. (Score:1)
Very little technical content. (Score:1)
Also consider that the IA-64 EPIC architechture was orignally an HP invention. As I understand it HP designed it but realized that they didn't have the money or volume to produce it well, so they went to intel. Intel realized that they were in a position of power (they could live without HP, but HP would be in trouble without Intel's Fabs) so they grabbed the architechture and made it their own. for Merced, intel is throwing a huge team of designers agiainst it, but is still doing a poor job because the corperate architechture is too rigid. (Intel has been known to be a nasty employer, with a slew of age and sex discrimination suits behind it) HP meanwhile is working on the next-generation IA-64 chip (McKinely I believe) which is coming along quite nicely. The last estimate I heard was that McKinely will really show the power of EPIC chips, where as Merced will be about comperable to whatever Pentium successor they have out at the time.
anyhow, this is just my take on it all.
What a dope! (Score:1)
It will be interesting to see if Intel Panics and tries to throw more people at the project if it falls behind. IMHO, that would cause more harm than good.
As I understand it, the HP Mckinely team is much smaller, which is a more intelligent way to attack a problem like this (drasticly new architechture)
MERCED EPIC v DEC ALPHA 21364 (Score:1)
It would be nice to know how many SMP cpus you can run with merced at once. If your building a serious server , multithreading and smp will probably save the day over multiple instruction pipelines any day.
It is interesting tho that the EPIC spec can support a virtually limitless number of instruction pipelines simultaenusly (at least according to the Ars Technica review) so if this compiler based strategy works, and chip densities increase (as they will) we could see some very , very wide CPUS.
MERCED EPIC v DEC ALPHA 21364 (Score:1)
I think I'll buy an alpha. It will probably cost less anyway. It's supported by linux (or vice versa) and we know the damn thing works.
Only Linux will run well on Merced (Score:1)
Of course no-one ever did this. In the early days it was probably mostly because everyone needed it to run on the old processors too. These days it is partly because nobody probably remembers this anymore, or not many people have bothered to do _that_ good a job of optimising for exactly how the pentium works anyway.
This time around, the same thing will happen, EXCEPT that probably Intel will come out with a proper VLIW gcc based compiler for Linux/UNIX. They have to do this for the chip to survive.
Now sure, MS and Borland or whoever will make their own compilers but nobody will use the VLIW stuff for years because of backward compatibility. They will stick with whatever x86 compatibility box Merced has.
But any Linux vendor can just do a re-build of everything (use the source Luke). This might be a big win for Linux to take a speed leap over Windows.
No mention of unix workstation/server vendors (Score:1)
Very little technical content. (Score:1)
hold off on merced comments just yet. (Score:1)
On the other hand 7 execution units isn't a good idea on a plain RISC because you can't avoid instruction dependencies regardless of how good the compiler is. But they are useful on vectorial chip. I guess they are making a vectorial CPU that can be scaled up adding more chips to the system.
Chip Level Multiprocessing (Score:1)
throw more cpus at a problem is silly, this
does bring up the related idea of chip level
multiprocessing. That is, if you *do* have
a program that can be run well on an SMP
machine, then you can use a computer that
has two or more conventional cores on a single
chip, sharing an L1 cache. This may be a
better way to use a transistor budget than
fancy VLIW schemes. The shared cache would
make interprocessor communication very fast.
Linux programmers should try to make their
programs SMP-friendly.
EPIC = Dynamic VLIW, not plain VLIW (Score:1)
set to the processor generation. Intel would
not be successful if Merced v2 could not run
Merced v1 software.
EPIC fixes the problems with VLIW.
EPIC = Dynamic VLIW, not plain VLIW (Score:1)
The compiler links together variable-length
groups of instructions that can execute together.
Example:
Your compiler finds a group of 17 instructions
that can execute together. (this is reasonable,
since there are 128 registers) Your CPU can only
execute 6 at once. You upgrade the CPU to one
that can execute 10 at once. No problem!
You only have trouble if your new CPU has more
functional units than your old compiler was
able to feed. This is simply an issue of a poor
compiler, not one producing code that is better
for the old CPU.
MERCED/SMP, IPC and this article (Score:1)
Doug
MERCED EPIC v DEC ALPHA 21364 (Score:1)
dmp