Please create an account to participate in the Slashdot moderation system


Forgot your password?

Troubles with Merced 90

Brandon Bell writes "Everyone has their theory on why Intel's Merced is in trouble. Kraemer just wrote an opinion piece that discusses two problems he thinks its facing: the compiler and the sales model."
This discussion has been archived. No new comments can be posted.

Troubles with Merced

Comments Filter:
  • by Anonymous Coward
    Its happening now just like it has since the begining of time. Everytime someone creates something completely new, and it's a risk, people come out of the woodwork to dismiss it. Before everyone sits around and says the Merced chip is done, wait till it ships before judgement is passed. If it ships on schedule, and is buggy, is that better then it shipping a year late, and flawless? As far as I know, no other chip company is taking this much of a risk on a new chip. I say more power to Intel.
  • by Anonymous Coward
    Explicit parallism (EPIC) has nothing to do with multiprocessing (SMP). Dell and Gateway have nothing to fear, but idiots who write articles like this.

    To take advantage of EPIC, a compiler needs to look for machine instructions that have no dependency on each other. Those instructions can be executed simultaneously.

    To benefit from multiprocesing a program must use threads (break itself up into what are called lightweight processes). Threads are an operating system service, not really a processor level thing. A compiler cannot make a thread - even on Merced. The programmer creates the threads by triggering operating system calls within the process.

    Thus, even on Merced, an ordinary Joe Blow can't just make individual programs faster just by popping in new processors unless the software is written using threads. And even then, the benefit will only affect software on operating systems that offer both user and kernel level threads like BeOS and Solaris UNIX. NT doesn't really cut the mustard. And Linux doesn't either.

    As far as writing the compiler goes, he's partially correct there, but only partially. All programs have a few instructions that can be executed simultaneoudly safely, but how mush faster would that make the program. A compiler must be well written and this will make quality comparisons between different vendors' compilers much more useful. How crappy will the MS Visual C++ be then? Will NT even run on Merced?

  • by Anonymous Coward

    You got the Alpha and PA-RISC mixed up: the Alpha always assumes that a branch to a previous address will be taken (which makes loops fast and gives compilers a handle on how to optimize code with well-understood flow).The main advantage to using such a simple (and more or less effective) technique on the alpha was that it consumed very few transistors and the Alpha was facing very severe space constraints.The 21264 is about twice as powerful as the 21164 at the same clock speed, and most of the benefit came from improvements in the branch prediction (made possible by better fab technology relieving some of the space constraints; the Alpha is a *big* processor).

    Branch prediction is very important for keeping deep pipelines from stalling.If your pipe is 33 instructions deep and your branch prediction is only 90% effective, then your branches cost you an average of three extra cycles each.

    Speculative execution is another powerful tool for keeping your (now tree-shaped) pipeline full, but it's not intended to be a complete replacement for branch prediction.On systems where the pipe isn't trivially shallow speculative execution is used with branch prediction (ie, executing speculatively on the earliest branches and/or poorly predictable branches and using branch prediction for the rest).Speculative execution is expensive in terms of duplicating issue logic and ALU, but that's not much of an issue for today's microprocessors -- most of the space on-die is taken up by memory cache, which tends to become much less effective per-transistor once beyond a certain (already long surpassed) size.As long as the extra logic for speculative execution yields better gain per transistor spent than L1 cache, it's a win.

    Yet another technology for ducking the high cost of conditional branches is predication.Predication is orthogonal to prediction and speculative execution.Its biggest strength is that it doesn't require much extra logic, doesn't require splaying your pipe into a tree (a la speculative execution), and greatly reduces the cost of small blocks of conditionally executed code albeit not as much as good branch prediction would, so its use is more or less limited to small blocks of hard-to-predict conditionally executed code, and having it in your processor by no means allows you to get away with not using good branch prediction logic.

    -- Guges --

  • by Anonymous Coward
    EPIC means the following:

    1. The compiler has to explicitly package instructions that have no dependences into parallel issue packets. This task is currently
    done by hardware in superscalars.
    This means that you can't just slap another CPU onto the board to make things faster:
    the parallelism is in the instruction level and is compile-time determined; most bindings are done
    statically. For this to work right,
    almost all instruction, pipeline and functional units characteristics have to be exposed. For example, for software pipelining (a technique to
    overlap instructions from different iterations of
    a loop) to work, pipeline latencies, and functional unit resource constraints at each time step has to be carefully considered.

    2. EPIC is basically VLIW, but don't say that because all VLIWs (except DSPs like TI's C6) have been commercial failures. Besides it's not as salesirific as VeeEllEyeDoubleU.

    3. The compiler is crucial. So far, only Univ of Illinois IMPACT group and HP CAR group really really knows how to build one well. (My opinion
    of course)

    4. For more technical info, read comp.arch,
    look at proceedings such as MICRO. See also Just don't listen to the clueless.

    Allen (
  • by Anonymous Coward on Saturday April 03, 1999 @01:53PM (#1950180)
    The article has very little technical content. And it may not be accurate even.

    I didn't knew that Carnack was the semi-god of compilation research, and I thought that some people had OSes running on Merced simulators.

    The sales model: of course Intel made a gamble ; but the gamble is that you couldn't make much architectural optimization on current RISC (maybe that's why people are throwing away die area with multiple MMX, 3DNow!, whatever "multimedia/SIMD" units), so a new paradigm shift could outperform older units. Basically, Intel is betting that superior performance is a valid reason for being (partly) incompatible with x86 (or to keep up with competitors).

    The a huge part of Merced issue is essentially technical, and the article is just completly out in this respect. Please, someone fix this and post relevant URLs...

  • by Anonymous Coward on Saturday April 03, 1999 @01:52PM (#1950181)
    This guy clearly doesn't know anything more about what EPIC is other than what the acronym expands too.

    Even from the small amount of information published about IA64, it is clear that there is absolutely no support for automatic scaling simply by adding cpus. EPIC refers to the way each individual cpu decodes the instruction stream. EPIC is no more inherently multi-processor than the current IA32 instruction set.

    To get automatic scaling, you need something like Tera's Multi-Threaded Architecture. Too bad they can't seem to ship the damn thing, and that it costs a couple of million.

    See: [] for more info.

  • If the trouble is in that the money model does not work with easier to upgrade hardware, then maybe the model needs to change. Currently Dell and Compaq make money selling whole computers. Perhaps they should sell or lease parts in addition to cases. That way you could change the CPU every few months and keep current, for a fee of course.

    Time and progress won't hold still, so perhaps you shouldn't.

  • It's not just _a_ compiler that has to be worked over, it's all of them.

    In addition, it's widely accepted that there will be faster RISC CPUs available then Merced when Merced ships, and even faster x86 chips (at running x86 binaries). Before using this to claim that Merced is dead, remember that this was true of the initial RISC chips when they came out. What this means is that it'll simply take a while for EPIC to mature and for the advantages to come to the fore.

    The problem is that Intel doesn't have much of a choice. Well, I suppose they could have gone for a standard RISC chip. But something post-x86 is necessary. If nothing else, the 32-bit limitations of the architecture are hurting Intel's sales in the lucrative server market (where 10's of gigabytes of RAM are common, and 100's not unknown). The desktop user probably won't care for another 4-5 years, but the server market started caring 3 years ago.

    One interesting note in all of this is how this is affecting the Intel/Microsoft relationship. By it's actions, Intel has no confindence in Microsoft being able to ship an Enterprise-ready 64-bit clean OS any time soon. Not that I blame them.

    Intel is learning one nice feature about open-source operating systems- they don't have to depend upon someone else to support their chips. For a small engineering investment, they can do it themselves- and if you want something done right, doing it yourself is a real good idea. Making a small investment in a small company (like, say, Redhat) makes a lot of sense in this context.

    That being said, I think EPIC is an interesting design with a lot of long-term potiential. Standard RISC processors have a hard time averaging more than about 2 parallel instructions. Research done by HP indicates a lot more than that is possible- it's just computationally infeasible for the _processor_ to find it.
  • I wonder how much time the author of this article
    took to research the matter.

    I see no mentioning of any of the unix vendors.
    Both HP and SGI are going to use the IA64. HP
    will be fasing out it's PA-RISC CPU in favour of
    the IA64. (Don't know about SGI's use of MIPS
    CPU's.) Both vendors have extensive experience
    with multi-scalar RISC CPU's. Also, Intel has
    it's own RISC CPU's and the are several 3rd party
    compiler developers probably just waiting for a

    Also, he starts comparing Joe Average's IA64
    system with real server machines (HP PA-RISC,
    Alpha AXP, MIPS R10K, Sun UltraSparc). An
    IA64-based system is going to cost more that Joe
    makes in a year! He can't even buy a machine
    based on one the the currently popular server
    architectures (except maybe Intel IA32/Xeon).
    Also, the comment about just adding an extra CPU
    is also valid for current SMP Windows NT based
    systems, since almost all software is
    multi-threaded. (I really like my dual P150
    running Linux...)

  • The author doens't know what he is talking about.

    Most processors are already parallel in the way EPIC means parallel. It has multiple units of execution which can concurrently process instructions. Multichip parallalism is a much more tough problem with lot's of different problems to beat. It has nothing to do with merced or the sales model.

    An IA-64 instruction comes in a bundle with 2 other instructions, all together there are 3 instructions in a bundle. Each instruction is something like 40bits long and each bundle has a dependancy flag of several bits. The performance problems that hinder chips the most are pipeline stalls and branches. The chip has a ton of logic that tries to predict branches and choose the right one and modern chips have a ton of logic to execute instructions out of order to reduce stalls. IA-64 forces the job of stall detection to the compiler, which makes the instruction bundles and chooses the dependancy flag (the flag says which instructions in the bundle and conflict) that way the chip doesn't need as much logic for out of order execution, they can focus on more important things. This is also a piece of cake for modern compilers, IBM, Sun, MIPS and DEC all have the technology to do this and most have for years and years.

    To solve the branch problem, IA-64 doesn't use branch prediction. This is the really important part of EPIC. It executes both branches and once the correct path is know it discards the instructions it executed on the wrong branch. THis is tough to do. The compiler is also supposed to help with this and add some bits to the flag and this is a tough thing to do.

    If it all works, IA64 chips will be fast, but nothing stellar because RISC chip makers have done such a great job of dealing with these problems already. So Intel has chosen to make a very complicated design, with some hard but not impossible compiler changes and they aren't going to deliver the ultimate performance they have been promising for years. There are definitely hard technical problems to solved but they aren't that bad, I think the bigger problem is actually building a chip that can compete with modern PowerPC and Alpha RISC processors and look like it is innovative. Intel is breaking compatibility and once that is done it's anybody's market because they have nothing that makes them look better than the other guys (like 25 years of x86 software...)

    The funny thing about all this epic talk is that intel still has to have logic on the processor to tell if the compiler lied... They were trying to get rid of that logic to make a leaner and meaner processor but they still have to have it.

  • I don't really know for sure, but it seems to me that one of Intel's major problems is that they want to charge an extreme premium for performance, and don't want to wake up in a world where you can scale processor power by adding CPU's.

    I find it odd that just when the PPC people will be removing the "SMP Premium" charge from their chips, and making very SMP-capable G4 chips, Be will be abandoning the PPC arena for Intel chips, where the only processors capable of scaling beyond two-way SMP are "non-consumer-grade" very expensive, very high-margin server chips.

    Once someone comes out with a decent low cost multi-smp-scalable (beyond 2!) chip and motherboard system, the world will beat a path to their door. I think if that ever happens, Be will have to decide whether or not to stick with Intel and watch some more processor-agnostic SMP-capable OS (like Linux) seize the ground of becoming a "media OS."
    Phil Fraering "Humans. Go Fig." - Rita

  • You don't understand what I mean.

    For the intended market for Merced, i.e.
    servers, being able to handle multi-threaded
    applications would help a lot.

    This also holds true for consumer OS's, otherwise
    M$ wouldn't be quite so concerned about Be.
    Phil Fraering "Humans. Go Fig." - Rita
  • The egcs compiler issue is significant. If you examine the current state of the Alpha backend for egcs, you have a very good indication of the problems that a Merced port might face. The 21164 is missing one feature (out-of-order execution?) which makes instruction scheduling very important and seems to really limit overall egcs/Alpha performance. Alpha represents a rather small deviation from the current CPU "norm," but yet these problems have persisted for quite some time. EPIC and Merced present a vastly different architectual model, so I worry about the ability of egcs development to keep up. Of course, with Intel's recent interest in Linux, perhaps they will be gracious and help engineer the compiler.

  • The author mentioned articles in The Register, and other places, but seemed to just ignore the content of those articles. He is right about the compiler though - it is a big issue, and I've heard it isn't going too well.

    Incidentaly, what I've heard at the Register does correspond pretty well what I've heard though other channels.

    Basically, the 'other' problems with the Merced are the design itself - Intel's engineers aren't used to doing this sort of thing, and also, apprantly they're a bit short on quality engineers, and are using lots of people who've just left university. Not the sort of people to give a massively complicated chip design to.

    Incidentaly, the '2nd gen' EPIC chip, the McKinley is mostly being done by HP, and is apprantly going pretty well. I've been hearing from my own contacts for a long long time, that the Merced might just end being a 'test' processor that never goes into production, and that the McKinley will be the first production EPIC.

    Not surprisingly both HP and SGI have recently been saying they'll still commited to their own architectures (at least for a while), after previously planning to dump them. I think HP have been saying they'll go with their own stuff for another 5 years.

    Intel isn't the only one being a bit late. Sun are about a year behind with their UltraSparc-III, though I haven't heard anything about why they're behind. (their reasons for being late is probably quite different to Intels.) Shame, as it seems like a pretty nice chip...

  • "Too many web sites (especially gamer sites, for some reason), don't seem to understand that Merced isn't for the average user. When it comes out, and at the very least for a few years following, it will be an ENTERPRISE level chip. This means 1) expensive as hell 2) used in"

    Intel's pricing model doesn't work that way. True, with every new chip Intel announces "this is for servers only". And the first few thousand chips do go into servers. But the server market isn't anywhere near large enough to pay back the cost of developing that chip, so within a few months workstations are released, first by one of the larger clone makers, then by Gateway 2000, and finally by Compaq.

    And Intel absolutely depends on these workstation sales to drive their learning curve /price reduction model. Otherwise they couldn't earn a return on the chip or keep AMD etc. at bay. Set up a spreadsheet and play around with some pricing models (first 10k chips at $2000, next 100k at $750, and so on). The arithmetic is quite simple and inoxerable.

    So look for the first Merced (McKinley?) workstation about three months after the first server is released.

  • "Not in this case. These pricing models will be on a much larger scale. Try $10,000 for the first 10k chips, etc. This will one won't be quick to the home user (intel will still be realeasing some"

    I hear what you are saying about home power requirements, and I tend to agree (although SimCity 3000 might just need that Merced ;-)). But consider that 10k chips * $10k/chip = $100 m. My understanding is that a chip like Merced costs Intel $1-3 billion (US terminology) all found. So $100 million won't go very far to pay off that loan - and the bulk of the sales still have to come from the workstation side.

    Just my 0.02.

  • MIPS and PA architecture chips are the architecturally closest *working* chips to Merced that we have today.

    I remember all the trouble MIPS had when they rolled out the R10000 chip. Initial performance was not up to spec because early estimates of performance were pretty much correct on the SPECfp numbers, but underestimated *how long* it took to get those numbers. It took a couple of years for the compiler people to wring out the best performance (ok clock speed was off too, but that was not the sole reason, nor was internal CPU wars within MIPS/SGI).

    Now the Merced is more complex than the R10000 (and at least the R10K has some *vague* similarity to the R8K, and PA architecture has been around for many years, so these companies had compiler writers experienced in some of the problems they were up against). Intel is starting from scratch here. I'd say when they've done first tapeout and have silicon in their hot little hands, it'll be at least a year before the compilers get close to the performance they hope for.

    Meanwhile, IA32 will be up to similar spec and Alpha, PA and MIPS (and SPARC perhaps) will be serious contenders.

    Just last week or so MIPS and HP announce they were reviving their CPU development for a further year or so (ie another generation), rather than trusting all to Merced (I assume that means last MIPS or PA in 2003-2005 now). What news did Intel give these guys for them to decide to make such an announcement??


    Michael Snoswell
  • by Doug Merritt ( 3550 ) <> on Saturday April 03, 1999 @01:57PM (#1950193) Homepage Journal
    Certainly the compiler is known to be a difficult important issue with Merced, so that part is sort of right -- although I don't see that as a reason for them to slip ship dates.

    But I don't know where he got this idea that Merced automatically makes all applications multi-processor ready; that's just plain wrong. High end processors have had multiple execution units for many years, which allows them a small amount of very fine grain parallelism: on average perhaps two instructions can be executed at once. Sometimes when you're lucky it can be more than two for a short burst. Merced will *not* be able to keep all 7 of their execution units busy 100% of the time, but they may get lucky and do so for an instant every once in a while, if their compilers are really good.

    None of that has anything whatsoever to do with multiple cpu's. The situation with those will be unchanged from the situation today with multiple e.g. Pentiums: applications won't take advantage of more than one cpu unless they are explicitly coded to do so.

    Therefore the conclusion of the article is dead wrong: the business model won't change, because he just misunderstood the issue with parallelism.

  • My summery of EPIC vs. VLIW vs. SuperScaler (note I use the term "functional unit" to mean "thing that can execute some kind of instruction", more functional units means a faster CPU, it's an oversimplifaction, but useful in this context):

    1. Super Scaler CPUs execute a stream of inctructions with no information about data dependencies or execution unit dependencies. They figure it out on the fly. You can change the latencey of functional units, or the number of functional units, and still run the same code. Many transistors are used to figure out the dependencies on the fly.
    2. VLIW executes a stream of instructions marked to show data and functional unit dependencies. No changes to functinal unit latancies or number of functional units can be made without risking (almost 100% risk in fact) breaking old code. No transistors are used to figure dependencies on the fly, all are used to actually do work. For a given number of transistors a VLIW should have more execution units then a SuperScaler RISC or CISC, at the cost of having no binary upgrade path.
    3. EPIC reads a stream with data dependencies marked, but no information about execution units. You can alter number of functional units, and their latency and still execute the same code. Many transistors are needed to figure out functional unit use on the fly, none to figure data dependencies on the fly. For a given number of transistors an EPIc should have more functional units then a SuperScaler RISC or CISC, but fewer then a VLIW. It pays a cost relitave to the VLIW for having an upgrade path. The compiler pays a cost relitave to SuperScaler designs for having to find data dependencies at compile time.

    Now my reply to allen's post:

    1. The compiler has to explicitly package instructions that have no dependences into parallel issue packets. This task is currently done by hardware in superscalars. [...]

    Yes, however these packages need only be free of data use dependences not executions unit dependencies (this is the big diffrence between EPIC, and traditonal VLIW).

    For this to work right, almost all instruction, pipeline and functional units characteristics have to be exposed. For example, for software pipelining (a technique to overlap instructions from different iterations of a loop) to work, pipeline latencies, and functional unit resource constraints at each time step has to be carefully considered.

    To get maximum proformance this is correct. From a "normal" VLIW you need it to get a working program. This diffrence is important. If you own a Multiflow (one of the defunct comercial VLIWs) and you upgrade it's CPU all of your old programs are incorrect (diffrent load latencies), and if you managed to compile your code to work with both load latencies, you still can't use more adders per cycle because the exact instructions that are executed per cycle are set in the code.

    If you upgrade from a Merced with three integer execution units and two load units to one with six integer units and one load unit your old programs continue to work. The may run faster, or slower, but they still work.

    I don't think you need to know all the details of the Merced microarcheture to get decent proformance. Just move the loads as far from the uses as you can, and get as many instructions marked intependent of their neibors as possable. You may end up moving loads farther away then needed, or marking more things as "can run in the same cycle" then your Mercend can gobble up, but that's ok. It won't kill you. It might make a furure Merced faster even.

    2. EPIC is basically VLIW, but don't say that because all VLIWs (except DSPs like TI's C6) have been commercial failures. Besides it's not as salesirific as VeeEllEyeDoubleU.

    The EPIC is basically a VLIW, except it is a little slower (for a given transistor budget), and it has an upgrade path. I think the upgrade path makes it comercially diffrent from VLIW.

    3. The compiler is crucial. So far, only Univ of Illinois IMPACT group and HP CAR group really really knows how to build one well. (My opinion of course)

    Multiflow made a good one (well, it got good results most of the time, it was pig slow). DEC eventually bought it when Multiflow went under.

    Also the compiler isn't as hard as it is for VLIW. With VLIW if you get the latency wrong you don't run. EPIC just stalls. Kind of like SuperScaler. Getting max speed requires tons of work, but the same work would speed up a SuperScaler (by exactly the same amount, if the SuperScaler has the same number of functional units). I think the big diffrence will merely be that EPIC CPUs will tend to have many more functional units so the bad-compiler vs. good-compiler will be more like a factor of 8 then a factor of 4 (or factor of 2 on a PII/PPro/PIII).

    4. For more technical info, read comp.arch, look at proceedings such as MICRO. See also Just don't listen to the clueless.

    Indeed. And that's not a slam, your opnions were well thought out, I just happen to think that requireing explicit dataflow (EPIC) is very diffrent from explicit dataflow AND instruction scheduling (VLIW).

  • To solve the branch problem, IA-64 doesn't use branch prediction. This is the really important part of EPIC. It executes both branches and once the correct path is know it discards the instructions it executed on the wrong branch.

    I assume you mean that it performs speculative execution (which is what you described) in addition to having predicated instructions, e.g. speculatively executing predicated instructions before it knows what's in the instruction's predicate register, and throwing away instructions' results as soon as it finds out that the predicate register was false.

    (I.e., predicated instructions aren't the same thing as speculative execution; don't automatically conclude that Merced does speculative execution merely because IA-64, of which Merced is planned to be the first implementation, has predicated instructions.)

  • If nothing else, the 32-bit limitations of the architecture are hurting Intel's sales in the lucrative server market (where 10's of gigabytes of RAM are common, and 100's not unknown).

    ...although, of course, one can support more than 4GB of RAM with a 32-bit processor, in the sense of a processor that can't handle more than 32-bit linear virtual addresses, as long as the processor's physical addresses can be more than 32 bits (as is the case with most, if not all, P6-core processors - Pentium Pro, PII, PIII) and as long as the chipset can handle it.

    It may be less convenient, as one might have to have a process manually map stuff into and out of its address space if you want a single process to use more than 4GB of RAM (as opposed to, say, having file systems use it as a buffer cache, although that may also involve switching mappings), but it's certainly still possible.

    (I say "linear virtual addresses" because, whilst the x86 segmented virtual addresses go up to 48 bits, they first get mapped by the segmentation hardware to a 32-bit linear address before being used as physical addresses, if you haven't enabled paging, or before being run through the page table, if you have enabled paging; not only are 48-bit addresses not necessary for accessing more than 4GB of physical memory, they don't even help you to access it.)

  • A compiler cannot make a thread...

    I was under the impression that "auto-parallelizing" compilers can convert, say, some Fortran or C/C++ code into multi-threaded code.

    See, for example, this Sun white paper [] on their compilers, which, it appears, can auto-parallelize loops to run on multiple processors.

  • And even then, the benefit will only affect software on operating systems that offer both user and kernel level threads like BeOS and Solaris UNIX. NT doesn't really cut the mustard.

    In what fashion? Its threads may not be "both user and kernel level" in the sense that there are user-level threads that can be executed by a pool of kernel-level LWPs, with the possibility that there are more user-level threads than kernel-level LWPs, as is the case in Solaris, but I don't see why that's necessary in order to get a speedup to a threaded program by adding processors - would not the model I think NT uses, wherein every thread known to userland is known to the kernel (I ignore "fibers" here), be sufficient?

  • Lots of companies have tried to protect their sales model by not manufacturing products that would break that model. Inevitably, their competitors manufactured those products and the sales model became broken anyway.

    DEC comes to mind - handicapping their low-end systems so that they would not outperform the high-end ones.


  • Intel hasn't innovated - give me break. Just the
    silicon process technology they've developed would
    wipe that arguement out - the other detail -managing
    to get x86 to go as fast as they have generation after
    generation disproves the statement also.

    Also, EPIC as detailed, isn't really even an HP
    invention, but rather an outgrown of things done
    by companies in the 80's such as Multiflow and

    The author claims that the compiler is a bitch -well it
    is, but they solved most of the problems with the
    compiler technology at those earlier companies - and
    some of the folks doing the IA64 are graduates of same.

    I WOULD worry about the scalability of the architecture
    though - that WAS a problem with the Trace and
    Cydra architectures. You had to recompile for
    suceeding generations of hardware. I personally
    don't know if EPIC solves that problem with VLIW
    architectures. Anyone know if it does, and how?

  • The interesting point about this is - a large
    number of the folks that DO have such
    experience work for Intel and HP.

    I know - I worked with about half of em
    at Cydrome.. ;-)

  • Excuse my ignorance (I know VLIW but not JIT ;-)

    Is JIT reduce to compiling for the hardware the
    JIT compiler is running on - or a virtual machine?

    If it's a virtual machine, then the Machine simulator
    is all that needs to be run thru the EPIC compiler -
    cause that's all that would execute - not some
    intermediate target language. If that's the case,
    EPIC won't present any real problem.

    Out of ignorance...which is it??

  • There are fundamentally two ways to make a
    SINGLE thread go faster - you can up the
    clock rate - or figure out a way to run more than
    one instruction at a time.

    Multi-issue pipelines, VLIW, and EPIC are attempts
    at solving the problem in the second manner. Once
    you have an adequate solution in the second space
    it becomes possible to improve it's performance in
    via the first method.

    Thus, from an architect's point of view - the second
    method is the first tried!

    Now - which is better - multi-issue pipelines or
    EPIC at a given clock rate. That remains to be seen.

  • This is falacious.

    SMP doesn't of itself improve the SINGLE process
    performance. You CAN write special code on a
    Beowolf platform (or SMP) and get the answer faster
    for the single thread via parallelism...but that is
    a problem that isn't well supported by automatic
    tools at this time. We DO have the technology to
    throw more execute units at a single thread and
    get the answer faster though - that is what Merced
    is all about.

    You can just as easily SMP a Merced class CPU and
    run multiple threads thru them as you can with a PPC
    or a Xeon. That isn't the problem that EPIC,VLIW, or
    Mulitple-issue pipeline(Superscalar) machines are
    trying to solve.

    Think SINGLE threads when talking about these

  • I seem to remember reading a few months ago that Intel is considering Merced as a test platform for their future IA64 chips and is not really intending to market it very strongly.

    Also, aren't there many people out there who know more about compilers than Carmack?
  • EPIC is not a multithreaded architecture. EPIC (which, as everyone knows, stands for Explicitly Parallel Instruction Computing) focuses on Instruction Level Parallelism (known in hardware and compiler circles as ILP for short).

    Explicit ILP architectures, such as Very Long Instruction Word (VLIW) architectures, Transport Triggered Architectures (TTA) and the like all focus on finding parallelism within a given single-threaded program. The compiler for such an architecture may divide separate paths of execution into a sort-of thread (for instance, it might execute down the "then" and "else" clauses of an "if" before it knows which it nees, or perhaps down multiple "cases" of a "switch"), but this is not multi-threading in the common, macro sense of the term.

    Multithreaded architectures, on the other hand, do focus on running multiple independent threads of execution, typically as if they were multiprocessors. For these CPUs, a given application needs to be constructed as a series of explicit threads (at the process level, not the instruction level), or a compiler needs to simulate this division. Alternately, a number of independent processes need to be available (although since all threads share a common pipeline, running independent processes together can have bad cache effects and cumulative stall effects that generally don't make anyone's day).


  • First, before everyone jumps in and says "Intel will never get there because the compiler will never get there," please don't forget that some shipping devices are already there.

    Quite simply, EPIC allows a compiler to tell the hardware ahead of time where it knows parallelism exists, so that the silicon (which is finite) doesn't have to hunt for it. Compared to the rate at which silicon must make scheduling decisions (at 800MHz, that's 1.25 nanoseconds), compiler time seems infinite.

    Granted, compiler time is not infinite, but for performance-critical applications, it is quite large. The Texas Instruments [] TMS320C6000-family of DSPs, for instance, rely on compilers and assembly optimizers in order to eek out that last bit of performance, and as any DSP engineer will likely tell you, its usually worth it. Cycles saved in one loop are cycles that can be spent elsewhere on value-added features, leading to a more valuable product.

    This points to the real fundamental problem as I see it, which is that the current VLIW darling in the industry is in the embedded world. Why should that make a difference, you ask? Because the embedded developer is the one most likely to take advantage of the raw capability that an exposed parallelism architecture can provide.

    Merced's biggest problem lying ahead is the fact that workstation-class code does not naturally exhibit large amounts of parallelism. While I was attending MICRO-31, I heard someone remark about how most code looks like a series of 5-10 instruction bursts followed by a jump. ICK!!

    Embedded programmers generally seem willing to learn whatever it takes to get their product running in the fewest MIPS (so that they can either use cheaper parts or provide more features), and so are often willing to jump through a few hoops to help out the compiler in order to get the parallelism they desire.

    Workstation programmers, on the other hand, are interested in the much bigger picture (since their applications are much larger and tend to have larger life expectancies), and so code tends to be human-friendly and not compiler friendly. (Certain heavily-traveled code paths in the Linux kernel being a noteworthy exception.)

    The point is that the Merced compiler will ship with alot of amazing compiler transformations, but very few of them will be effective at translating the hopping, skipping, and jumping nature of your typical general-purpose database-ish looking code into highly parallel performance-oozing EPIC instructions, at least straight out of the gate.

    Merced will inherently provide big performance wins to the compute-farm customers (your big engineering shops that currently use networks full of Sun or HP workstations to crunch VHDL, Spice, or whatever simulations around the clock), as these applications end up reducing to huge matrix manipulations and numeric crunching galore -- oozing with parallelism. But Merced will be hard pressed to feed up web pages or database queries much faster than any other architecture, unless it's able to massively crank its clock rate due to losing the shackles of the instruction scheduling hardware.

    Anyway, those compiler nuts in the crowd might find the following links useful and informative.

    • The Rocket Project [] -- ILP research at Michigan Tech University
    • VLIW Architectures [] -- a description of VLIW that's part of a larger presentation about VLIW compiler techniques.
    • The Trimaran Research Compiler [] -- HP's research compiler that was supposedly used in development of the architecture that begat Merced.
    • EE Times [] -- article which describes the release of Trimaran and includes a diagram showing the relationship of architectures from Superscalar to VLIW/EPIC to TTA.

  • Much as I hate to side with "the moneymakers," there is one advantage to the OEM's current business model as described in the article: as long as the OEMs are basing their income on the periodic-upgrade model, they have a direct incentive to provide quality systems and support: the better your experience with owning a Micron, the more likely your next computer will also be a Micron. If this changes, OEMs will be less interested in customer experience of quality, and much more interested in the *perception* of quality, which in turn means (gasp) marketing.

    This is all assuming that this article is accurate in its description of the OEM business, which I am not 100% convinced of. Among other things, in a business with that kind of growth rate, wouldn't new users be at least as important, if not more so, than returning users?
  • OK, about 6 years ago, I took a EE course that taught us CSs about CPU design.

    At the time we were comparing Pentium, PA-RISC, Alpha, and MIPS.

    If I remember, Alpha hadd a huge amount of transistors dedicated to branch prediction. PS-RISK always assumed that the program would loop. As a class, we questioned how much branch prediction actually helped. Does anyone have a good feel, or even some numbers to descibe how much branch prediction improves performance?

    This Merced design of executing both branches seems like it would take an enormous amount of work. Is it really worth it? Isn't a simpler design able to operate at a higher clock rate?

    And, has anyone read about async processors lately? Anything ever released commercially for that?

    [forgive the English, I don't have an English compiler.]
  • I have no doubt that by the time Merced is ready to hit the market, Microsoft will have bloated up and fluffed out windows to the point where anything less powerful then a merced will be useless.
  • Why push all the work on the coder? That's why Sequent and SGI use ccNUMA - your apps take advantage of the inherit parallelism and scalability of the system, without having to be explicitly aware of it.

    Actually, Cray invented all of this, but hey, why nitpick :-P.

    SGI's ccNUMA white paper can be found here []

  • "The amount of openly available published research in the RISC compiler community is significant, and Intel has the bucks to hire more gurus on the topic if they need them."

    That would be all well and good, if we were actually discussing a RISC architecture. But we aren't - we're discussing VLIW.

    With the i960 and crew, Intel has all the RISC expertise that they ever wanted (or needed). Finding someone who can write compilers and tools for VLIW is a horse of a different color, however. There isn't much experience in the industry when dealing with VLIW; not only that, coding for RISC isn't going to help you with this type of architecture. Hence, the hair-pulling and delays from the compiler/tools group. This isn't a problem you throw money at to make it go away faster - it's a first run, and everyone on the team is learning as they go.

    If that doesn't convince you, keep in mind that Intel is partnered with a company that has deep experience with RISC architectures (HP). If HP and Intel together are having a rough time of it, I would submit that this can't be an easy design to work with - especially given that no one has done it before.

  • Umm, I distinctly remember reading an article (on /.?) about HP and SGI deciding to comeout with one or two more chips in their respective series because of Merced delays. I think, by the time Merced does ship, the chips from HP and SGI/MIPS will be significantly faster/scalable than Merced. The big Unix vendors just can't wait for Merced to ship faster boxes to their clients.
  • Before engaging in a lot of speculation, let me say that all that really matters is when Intel delivers, how well the chip is supported by then, and what performance it gives on the software people care about. At least in the UNIX universe, processors are so interchangeable that buying an Alpha now and switching to Merced if that looks like a better choice a year later shouldn't be a big deal.

    Maybe my view of Merced is colored by the fact that I have used VLIW machines in the past. My experience has been: a lot of code will not run at even close to the theoretical capabilities of the machine (because the compiler couldn't figure out how to squeeze the logic into the parallel instruction set) and there were few compilers and little software available for them.

    So far, I see little reason why Merced should be any different. Despite many years of research, compilers that are actually in use still haven't gotten very smart in understanding aspects of programs that need to be understood for parallelization and optimization. And Intel may try to help with C, C++, and Fortran backends, but what about all the other languages that are coming into use? We need chips that encourage the use of post-1970's languages, not chips that write them into stone.

    Merced will probably perform well on some very structured problems (geometric transformations, optimizations, other numerical problems, text search, etc.). But for those, adding vector processing units to a more traditional processor might be cheaper and result in better overall performance than Merced's architecture.

    There also seem to be questions about the way the VLIW architecture is implemented by Merced; supposedly, code compiled for one generation of the chip will not take advantage of more parallelism available in a later generation.

    I think there is a good chance that the Alpha will save Intel. People already know how to write compilers for the Alpha, and the chip is fast. According to an article in Byte (but, hey, where are they now :-), Alpha will have twice the performance of Merced at the time Merced finally gets released.

    On the one hand, I'm glad that some company is finally breaking with the dull tradition of processor design over the last 20 years. On the other hand, I'm not sure that this is the right way to do it.

    Actually, there is another rather radical change in processor design that has happened recently: the complete system on a chip (from IBM and maybe others). Those might allow very dense multiprocessor systems, leading possibly to very different designs.

  • His essay sort of comes down to this...

    "Well, I've got a bicyle, but if I just add one more wheel... I can go 50% faster! Boy, this is really gonna hurt car manufacturers."
    He's right about the compiler being hard, but I'm sure Intel realized this when they decided to go the VLIW route.
  • The idea with merced is and parallelism is this: the compiler does it all. No explicit coding should have to be done. If you write something in c++, the compiler will parallelize it to execute as much as possible in parallel within the cpu. Plus, the Merced architecture is scalable: more execution units can be added to future generations of chips for even more parallelism within the chip. Unlike the author of the article stated, this has nothing to do with multiprocessor systems, and the post above is correct, for that apps need to be coded with threads. That's not to say that there won't be multiple cpu systems: SGI and HP will definitely be making massively parallel supercomputers using merced, with 256+ processors.
  • Not in this case. These pricing models will be on a much larger scale. Try $10,000 for the first 10k chips, etc. This will one won't be quick to the home user (intel will still be realeasing some next get 32 bit chips (foster I think?)). And by the time it is ready for home market, well, I have serious doubts that anyone will need it. Think about it: right now, with what I actually use my computer for, all I need is a P200 with enough ram so I can run netscape, a word processor, and other common apps (so why is it that I have a dual celeron 450?). While there will undoubtedly be new apps that will start pushing cpu utilization, I think the trend will continue that the bottom line of applications that people actually use can be handled by a relatively slow processor, and only intense media functions will consume more. This indicates that people will not be buying full computers as we know them by the time Merced is out (maybe I'm pushing the speed at which this will happen a little bit), and will simply be buying "appliances" that will handle certain tasks (ie a WinCE style machine where it just does certain tasks and is networkable). I'm thinking the home user won't be having the "need" for a merced chip.
  • by zealot ( 14660 ) <xzealot54x@yahoo . c om> on Saturday April 03, 1999 @02:18PM (#1950219)
    Too many web sites (especially gamer sites, for some reason), don't seem to understand that Merced isn't for the average user. When it comes out, and at the very least for a few years following, it will be an ENTERPRISE level chip. This means 1) expensive as hell 2) used in supercomputers (a la SGI and HP) and 3)high end workstations/servers. The author of this article is right... it doesn't fit into Intel's business strategy for the consumer, but it isn't supposed to. Besides, I'm starting to get the feeling that by the time Merced is consumer viable, people will be using pure computers less, and computing appliances more.

  • I stand corrected... forgive my mistake. Back
    in '96 (don't laugh) I kept very good tabs on
    what was going on with Intel and its competitors
    regarding chip technology, with help from friends
    placed well at Intel (who would surely like to
    remain anonymous). At the time Merced was
    described as "essentially RISC" when compared
    with the CISC systems then being put out (and
    still being put out) by Intel. Over the past
    years I kept less abreast of the impending
    technologies (having moved my focus to more
    software development, and much of that *not* on
    Intel systems), but at least kept aware of
    scheduled *releases* and some of the current Intel
    technology. I clearly missed the IA64 move
    (talk about head in the sand) on which I have
    justed started to catch up, and hence the "RISC"
    discussion above.

    The basis of my argument still stands, but the
    compilers will be harder to write, and I see now
    why there are some delays. Micro$oft does claim
    to have a 64-bit windows running on a Merced
    simulator (like that isn't a bald-faced lie,
    judging by other orthogonal press releases coming
    out of Redmond). I still firmly contend that
    the current marketing infrastructure for Intel's
    products will change if it cannot handle the
    responbilities of making money in Intel's Brave
    New World. etc., etc., etc.

    Thanks for the heads-up.
  • by Roundeye ( 16278 ) on Saturday April 03, 1999 @03:45PM (#1950221) Homepage
    I'm not sure why this review was written.

    Intel has been plagued for a decade by backwards
    compatibility with a poorly designed CISC chip
    with one of the poorest memory subsystem designs
    still in current use. The amount of juice which
    can be squeezed from the '86 lemon is limited and
    it is a testament to Intel's determination (some
    would say stubbornness or stupidity) that they
    have been able to make this architecture a
    profitable industry standard (of course the more
    cynical (myself included on the occasional lonely
    night) might chalk this up as a testament to the
    power of a tightly run monopoly).

    Merced is a necessity if Intel wants to stay
    profitable in the face of not only Moore's Law
    but AMD and other not-so-dark horses. This chip
    has been designed for the most part for years.
    The compilers have been under development for
    years as well -- anyone who thinks otherwise
    doesn't know how Intel does business.

    A company which has the resources to write
    compilers for superscalar CISC with pipelining,
    data forwarding, bizarre MMX
    registers/instructions, virtual '86s while
    maintaining backwards compatibility with the
    original broken design will find writing a new
    compiler for a freshly designed clean RISC
    system a wonderful relief. The amount of
    openly available published research in the RISC
    compiler community is significant, and Intel has
    the bucks to hire more gurus on the topic if they
    need them.

    Marketing... It pains me to see so many people
    assume that "they way it is" is "the only way
    it can work". This is the same fallacious
    thinking that makes it painful to watch any
    Hollywood movie about time travel or the contact
    of our civilization with another (I think
    Indpendence Day may be the flagship example of
    this) -- the way we Americans do things in this
    day and age is superior to the way any other
    conceivable society could do them. Cultural
    ignorance and arrogance.

    This sort of thinking comes up quite often in
    discussions of why "Windows will be here forever"
    and now appears here in a discussion of Intel's
    marketing plan for Merced. The truth of the
    matter is that (1) Intel wants the market to
    change -- they have been burdened with the '86
    albatross for far too long, and (2) the market
    will change. Initially we hardware power users,
    systems hackers, and speed/systems freaks will
    jump on Merced because it is a better chip than
    a crappy CISC chip on steroids. The chipsets
    to run the chips will be there, and at least
    some variation in motherboard configurations.
    Dell/Compaq/Gateway will be able to sell a
    Merced system.

    If, as Intel puts more of its weight behind Merced
    (and more applications are brought to Merced) the
    current distribution system cannot change their
    marketing model to take advantage of the new
    configurations which will be possible and then
    *desired*, then someone will step up to make the
    new money by providing them. Because it's done
    a certain way now doesn't mean that that is the
    only way (I reiterate at the risk of sounding
    pedantic). This industry moves too fast to coddle
    companies which have become too large to steer

    The distribution channels for these systems, and
    multi-processor systems, will develop and may
    not include the current Big Players in the market.
    In addition, as Intel hopes, if AMD et al cannot
    create a chip to compete with Merced, and cannot
    anchor the market on the '86-type chips, they
    may also find themselves too big to steer out
    of the way of the Intel truck.

    Be careful. Merced could be a swan song for
    Intel, but I think it is more likely their

  • 1. What does Carmack have to do with writing a Merced compiler? He is an excellent 3-D game programmer, but most of the hand coded parellel-executions tweaks in quake were written by x86 assembly guru Michael Abrash, not Carmack.

    2. As other people have pointed out, he doesn't know what he is talking about.

    3. If I were looking for analysis on the Merced delays, I would dig around on for an excellent article by one of their staff on the subject rather than listen to this bozo. Synopsis of the MDR online article. Intel is leading the design of the Merced and they are using their standard massively parallel design approach (lots of engineers). Problem is, this approach works fine for successive iterations of an existing, well understood ISA & implementation, it is not working well for a brand new, cutting edge ISA and chip. His predection is that the Merced will have a very short life before it is superceeded by the second generation chip in the family, one being designed methodically and inexorably by a small HP led design team.
  • by Tardigrade ( 17769 ) on Saturday April 03, 1999 @01:50PM (#1950223)
    As long as Intel and the OEM's keep selling single-cpu boards, selling extra cpu's instead of entire systems shouldn't be too much of a problem. Most end-users don't like swapping mobo's and cpu's.

    Most users don't need the horsepower of their current K6-350, or PII-300, I'm still using a P-133. How much of their sales are going to be towards companies that can afford a hardware guy, or hard core gamers who have the skill and motivation to do this though? That might be a problem.

    That would also be enough motivation to keep on churning out a more advanced, speed and instruction-wise, processor though. Intel has been pretty good at making a new cpu ever 3 years or so, the average time between upgrades. They've been even faster with the improved chipsets. The rest of the computer has gotten better as well.

    As long as it's more expensive to build/upgrade to a state of the art system, I don't think the OEM's will have too many problems. Everyone, especially the hardcore gamers, know that the cpu isn't everything. The compiler is another story, but that'll happen too.

  • SCO's UnixWare 7 has been running on Merced emulators for quite a few months now (and I hear that recently a version of NT4 is too). These emulators run on IA32 NT and UnixWare boxes - so Sun must be using NT or SCO's stuff to run under... ironic? :)
  • I believe Sun got solaris running
  • IMHO, the trouble is with Intel. Though they make some decent chips, but it has been a long long time since they have done any huge innovation. Consider: with the virtual monpoly they held on PC chips for ages and ages they should have been able to pour money into R&D and create some pretty new and exciting things. True, they were held back by backward compatibility, but this should have been more than balanced out but $$$. Even the latest PIIIs on the enterprise level are not better than HP chips, Sun, SGI MIPS, PPCs, or what have you.

    Also consider that the IA-64 EPIC architechture was orignally an HP invention. As I understand it HP designed it but realized that they didn't have the money or volume to produce it well, so they went to intel. Intel realized that they were in a position of power (they could live without HP, but HP would be in trouble without Intel's Fabs) so they grabbed the architechture and made it their own. for Merced, intel is throwing a huge team of designers agiainst it, but is still doing a poor job because the corperate architechture is too rigid. (Intel has been known to be a nasty employer, with a slew of age and sex discrimination suits behind it) HP meanwhile is working on the next-generation IA-64 chip (McKinely I believe) which is coming along quite nicely. The last estimate I heard was that McKinely will really show the power of EPIC chips, where as Merced will be about comperable to whatever Pentium successor they have out at the time.

    anyhow, this is just my take on it all.
  • Intel is leading the design of the Merced

    and they are using their standard massively parallel design approach (lots of engineers)

    It will be interesting to see if Intel Panics and tries to throw more people at the project if it falls behind. IMHO, that would cause more harm than good.

    As I understand it, the HP Mckinely team is much smaller, which is a more intelligent way to attack a problem like this (drasticly new architechture)
  • I read the projected performance of the Dec Alpha 21364 from the Digital people (so its probably marginally biased) and the alpha blew the merced clear out of the water. Digital have been working on the alpha and its revisions for how many years now 5, 6, 7 ?? so one would expect it to be pretty refined. As well i think it will support upto 64 way SMP for serious machines.
    It would be nice to know how many SMP cpus you can run with merced at once. If your building a serious server , multithreading and smp will probably save the day over multiple instruction pipelines any day.
    It is interesting tho that the EPIC spec can support a virtually limitless number of instruction pipelines simultaenusly (at least according to the Ars Technica review) so if this compiler based strategy works, and chip densities increase (as they will) we could see some very , very wide CPUS.
  • And the worst part is, we need gcc to be able to support it thouroughly to be able to run linux and take full advantage of its capabilities.
    I think I'll buy an alpha. It will probably cost less anyway. It's supported by linux (or vice versa) and we know the damn thing works.
  • Remember way back when the first pentium came out? It was pretty slow, but the word was it would speed up heaps when applications were re-built for the pentium and optimised with re-order of instructions and stuff.

    Of course no-one ever did this. In the early days it was probably mostly because everyone needed it to run on the old processors too. These days it is partly because nobody probably remembers this anymore, or not many people have bothered to do _that_ good a job of optimising for exactly how the pentium works anyway.

    This time around, the same thing will happen, EXCEPT that probably Intel will come out with a proper VLIW gcc based compiler for Linux/UNIX. They have to do this for the chip to survive.

    Now sure, MS and Borland or whoever will make their own compilers but nobody will use the VLIW stuff for years because of backward compatibility. They will stick with whatever x86 compatibility box Merced has.

    But any Linux vendor can just do a re-build of everything (use the source Luke). This might be a big win for Linux to take a speed leap over Windows.
  • I believe HP is "putting a bob both ways". They are still actively developing the PA series in case Merced doesn't pan out.
  • Yes, even Carmack doesn't think Carmack is a compiler god. You don't have to go much further than this idiotic suggestion to discount this article. Whatever the state of the Intel Merced compiler you can count on the fact that there are some incredible minds working away on it, people at the top of their fields like John C. is in his.
  • Why is everyone talking about Merced? That chip doesn't exist, we know nothing about it. Why should we care about Merced when we have nice Alpha's, MIPS', Sparc's, PowerPC's... that already do the job? We don't need another RISC chip just because Intel made it.
    On the other hand 7 execution units isn't a good idea on a plain RISC because you can't avoid instruction dependencies regardless of how good the compiler is. But they are useful on vectorial chip. I guess they are making a vectorial CPU that can be scaled up adding more chips to the system.
  • While the notion that EPIC allows one to
    throw more cpus at a problem is silly, this
    does bring up the related idea of chip level
    multiprocessing. That is, if you *do* have
    a program that can be run well on an SMP
    machine, then you can use a computer that
    has two or more conventional cores on a single
    chip, sharing an L1 cache. This may be a
    better way to use a transistor budget than
    fancy VLIW schemes. The shared cache would
    make interprocessor communication very fast.

    Linux programmers should try to make their
    programs SMP-friendly.
  • VLIW is crap, because it ties the instruction
    set to the processor generation. Intel would
    not be successful if Merced v2 could not run
    Merced v1 software.

    EPIC fixes the problems with VLIW.
  • With EPIC, it is OK to add more functional units.
    The compiler links together variable-length
    groups of instructions that can execute together.


    Your compiler finds a group of 17 instructions
    that can execute together. (this is reasonable,
    since there are 128 registers) Your CPU can only
    execute 6 at once. You upgrade the CPU to one
    that can execute 10 at once. No problem!

    You only have trouble if your new CPU has more
    functional units than your old compiler was
    able to feed. This is simply an issue of a poor
    compiler, not one producing code that is better
    for the old CPU.

  • Ok. I think that the guy who wrote that is missing a few key ideas. Unfortunately, EPIC does NOT magically fix Inter Process Commuunication, nor is it a magic bullet. It DOES have a very heavy reliance on the compiler, but that is because the compiler will try to provide parallelism for the code. For a good article, take a look at this article from Byte Dec. 97:

  • Yes, but HOW much parallism can be extracted from most programs? As I understand it, the RISC architecture provides more optimization at runtime. I am interested to see Merced perform, but remember that the DEC guys have really put a lot of work into the compiler for the AXP and it is quite mature. I think it will take Merced a couuple of years for the compiler to matuure.


Karl's version of Parkinson's Law: Work expands to exceed the time alloted it.