Please create an account to participate in the Slashdot moderation system

 



Forgot your password?
typodupeerror
Check out the new SourceForge HTML5 internet speed test! No Flash necessary and runs on all devices. ×
AMD

AMD vs Intel: CPU Design Philosophy 147

Johan writes "We have published an in depth comparison between the CPU design decisions that AMD's engineers (Athlon and upcoming Mustang) made and those of Intel's engineers (Pentium 4). Some of the questions answered: Are double pumped, hyperpipelined, low latency designs the only future for x86? Will future designs from AMD and other competitors be similar to Intel's innovative seventh generation core? "
This discussion has been archived. No new comments can be posted.

AMD vs Intel: CPU Design Philosophy

Comments Filter:
  • Right. Now if only they lower the price 80% and magically make all the x86 applications disappear, then I might begin to believe you.
  • by Xevion ( 157875 ) on Monday October 23, 2000 @10:32AM (#682477)
    Intel is going the clock speed route for a reason, and it is pretty clear to me. While a 1.4Ghz PIII or Thunderbird may be a good deal faster then a 1.4Ghz P4, the P4 will be avalible at significantly higher clock speeds (2Ghz planned in Q4 2001 I believe) because of its hyper-pipelined design. Also, Intel will be able to (Hopefully from Intel's business perspective) charge obscene amouts of money for these CPUs because their clock speeds are so high.

    Intel, however, is truly making an innovative processor design with the P4. The speed of electricity is possibly becoming a bottleneck (As the 2 "drive" stages in the pipe Ace's pointed out as possibly being for the signal to reach the other side of the chip) The only problem with this is that AMD has caught up extremely quickly in the past year, and while imperfect, the Athlon design scales in clock speed extremely well. With the 1.2Ghz Thunderbird here, on a .18 micron process, no less, as long as AMD can keep up with the process technology they will stay in the high end market.

    The P4 uses about twice as many transistors as a Thunderbird or Coppermine, in order to achieve the massive hyper-pipelined design that they have. AMD on the other hand, with Sledgehammer, will integrate 2 CPU cores onto one die with a shared L2 cache. I would imagine that the design for Sledgehammer is similar to the Athlon, but with 64 bit extensions. Why not use what technology they have and refine it instead of reiinventing the weel?

    IBM, with Blue Gene, is taking this parallelism to an extreme (Quote at bottom of my post is directly off of IBM's site), and AMD is taking a similar route on a much smaller scale. Now think of the potential performance difference between a P4 1.5Ghz and a 1.2Ghz Thunderbird given that the P4 is slower at the same clock speed. Almost no perceptible performance difference, in all likelyhood. Now imagine a dual 1.2Ghz thunderbird, and imagine how that would perform in comparison: yes, all of the systems are extremely fast, but the "dual" system would stand out as the fastest. Take into consideration that the die size for Sledgehammer won't be much more then what it is for the P4. So, you will be able to get a dual-cored CPU for around the same price as a single cored CPU that gets lower IPC and runs at a higher clock speed.

    As you can see, there will be no comprison for Sledgehammer on the desktop as long as there is enough memory bandwidth to satisfy its needs.

    • Blue Gene will consist of more than one million processors, each capable of one billion operations per second (1 gigaflop). Thirty-two of these ultra-fast processors will be placed on a single chip (32 gigaflops). A compact two-foot by two-foot board containing 64 of these chips will be capable of 2 teraflops, making it as powerful as the 8000-square foot ASCI computers.

      Eight of these boards will be placed in 6-foot-high racks (16 teraflops), and the final machine (less than 2000 sq. ft.) will consist of 64 racks linked together to achieve the one petaflop performance..."

  • Ah yes, my poor virtual hosting company... and their celeron 300 just isn't cutting it right now...

    Actually, I've done some changes to the code - so it is a little quicker now.
  • Ugh. Ignorant crap getting a +4 insightful. Well, let's get this over with...

    Rather, it is a piece of self-promotion by Ace's Hardware, who sent this story in themselves.

    Many websites send notices of their original content to each other, especially when they know that it is excellent content, like this article. ArsTechnica sends notices both to Ace's and to /. Here [slashdot.org] is an example of exactly the same brand of "self-promotion" from Hannibal, and as regarding a (IMO) far less worthy though still interesting article.

    The article itself doesn't say anything the knowledgeable don't already know.

    This is false. I am a hell of a lot more knowledgeable in matters of MPU architecture than you, and I learned quite a bit. But I suppose you were already an expert on the intricacies of load-store reordering on the P6 vs. the K7, on the precise weaknesses of the K7's branch prediction algorithm (i.e. that it throws an exception and flushes its BTB when presented with more than two branches in a 16-byte aligned code window), on the dependancy scheduling problems of very large instruction reorder buffers and what they imply about the P4's clock-speed ramp. I suppose you'd already seen benchmarks which measured the effects of L2 latency and branch prediction on IPC. (You wouldn't mind posting a link, would you troll?)

    In fact, it reads like a high-school report, and not even a very well-written one. E.g., "First we will try to analyze the most important shortcomings, next we will search for possible solutions." Sounds just like the simplistic expositions of a high school term paper.

    Way to go, asshole. The author's name is Johan De Gelas. He lives in the Netherlands. ENGLISH IS NOT HIS NATIVE LANGUAGE. I'd like to see you post a single sentence in Danish, much less an incredibly insightful article on competing philosophies in next-generation 1.5 GHz+ MPU design.

    Look, I know that there is a lot of mumbo-jumbo laden "technical" architecture discussion going around the web, often quite nonsensical and written by good-old fashioned Americans who just haven't had the benefit of 8th grade grammar (or a solid education in MPU design). The point is, you were horribly wrong to lump this article in with that schlock, and you apparently did so only because it contained terms and explanations which you didn't understand. Furthermore, you made your point, with quite authoritative tone, in a public forum. Of course you have every right to be loud and wrong in /. Indeed, I've been known to be loud and wrong in /. several times before. Still, if you don't know what you're talking about, please please please don't talk.

    I repeat: the article is not a technical piece at all. Hannibal at ArsTechnica writes technical pieces about CPU design. This article at Ace's Hardware says nothing insightful.

    Completely backwards. Now, let me first say that I not only respect Hannibal tremendously, but that his articles (particularly the excellent RISC vs. CISC in the Post-RISC era [arstechnica.com]) were what inspired me, a bit over a year ago, to begin to learn much more about MPU architecture and design. They are written very vividly, with strong prose and excellent, clear analogies. They do a fabulous job of explaining complicated concepts and new trends in MPU design to a lay reader.

    ArsTechnica, like /., is a general-purpose tech site. Ace's Hardware is all about hardware, mainly MPU design and architecture. Indeed, it is perhaps the most respected daily-updated MPU architecture site on the web. Several experts--many very well informed amateurs, many who work in the industry--post in their technical forum [aceshardware.com]. We're talking people like Aaron Spink, MPU designer for Compaq, who works on what is generally acknowledged to be the best MPU design team on the planet (the Alpha). We're also talking people like Paul DeMone, designer for MOSAID, who in his free time writes IMNSHO the best technical series of design articles available for free [realworldtech.com], including this excellent article [realworldtech.com] which destroyed one of Hannibal's fundamental premises in that Post-RISC article I loved so much. And indeed, Hannibal immediately posted a link to the article and said as much. That's because, as great a service as he provides--and I really, really love Hannibal's articles and they're the first thing I recommend to anyone interested in learning about MPU design--they are *not* technical, they often miss important points which an experienced professional would not (as in this case), and Hannibal is just a student with the benefit of a few architecture classes and a well-worn copy of Hennessy and Patterson.

    So by all means, people--if you're reading this and want to learn about the fascinating world of MPU design, start with Hannibal [arstechnica.com]. But just know that his articles, while very good, are *not* technical; when you want technical, a great place to start is Ace's [aceshardware.com].

    Now that we're through with that bit of unpleasantness, let's clean up your misstatements, shall we?

    In fact, it misses the point. It dares to call the P4 "innovative" and wonder whether future designs in the x86 world will copy it. Well, of course not! How many times must it be said that the P4 barely keeps up with the Athlon and performs less well than a P!!!? Because, that is a fact. Numerous production samples have leaked, with the test results uniformly and without exception pointing to the fact that even if the platform's performance is improved by release time--which it should, since these are samples not a retail product--it won't outperform a P!!! with equal clockspeed. That's why the P4 is being released at 1.4 and 1.5GHz initially, because if they were released at 1.2GHz they'd be outperformed by the 1GHz P!!! and that wouldn't be good.

    Oh really. Just like preproduction benchmarks of the K7 [gamers.com] proved it to be "closer to that of a Celeron 366 than any Pentium III." Just like preproduction benchmarks of the PII [tomshardware.com] lead to the following insightful comments from Tom's Hardware (a leader in the "P4 is overhyped, clock-speed isn't everything, blah blah blah" ignorance these days...):

    Well, the beef with the Pentium II is that it seems to suffer from BSE (bovine spongiform encephelephy a.k.a. Mad Cow Disease), although I doubt that any British cattle was involved. Although BSE infected products shouldn't be imported, I'm pretty sure we'll also see the Pentium II here in Europe soon after the 3rd of May when it is finally released. However, since I wouldn't eat BSE infected beef, I wouldn't be interested in risking an infection of my computer with this CPU either.


    ...For former Pentium users there's hardly any attractiveness in the Pentium II either. The Windows 95 performance is hardly any better and in some cases even worse than the cheaper Pentium Pro or Pentium MMX. Windows NT users would be the last ones to be interested in the Pentium II, there is just no reason at all to swap the Pentium Pro for a Pentium II.


    Guess what: preproduction benchmarks are always wrong. Again, preproduction benchmarks are always wrong. And in particular, the benchmarks we've seen on those preproduction P4's are--just like the benchmarks included in the articles above (i.e. the K7 scoring only 60% of a clock-normalized PIII on FPUMark; the PII doing worse on 32-bit code than a P5-MMX)--utter nonsense given what [realworldtech.com] we know [realworldtech.com] about the P4's [realworldtech.com] design [aceshardware.com] . Thus the logical conclusion is that, just like the preproduction MPU's "benchmarked" above (and let me remind you that those were at least close enough to final silicon to be clocked at release-ready clock speeds), the P4's we have seen "benchmarked" on the web so far have been sandbagged.

    Now, the common reaction to these charges goes something like this: "Sandbagged? Impossible! After all, these P4's are at most one stepping from final silicon, maybe even final silicon! Thus they can't be sandbagged!" Which is utterly false. Obviously the sandbagging isn't done in the chip design--that would be idiotic. Rather, it is done in microcode. Every feature of the chip can be turned on and off, tuned and detuned, in microcode. Thus it is trivial to ship a preproduction MPU off for validation with, for example, part of the L2 cache disabled, or the BTB or instruction reorder buffers set to flush when they don't need to, or the way prediction on the two-cycle L1 cache turned off, or tuned wrong, or with certain x86 instructions mapped to unnecessarily slow circuit paths, or any of dozens and dozens of different things set wrong. Indeed, this is the common state of internal preproduction MPUs, because the only way to test corner cases and pathological cases is by disabling one part of the chip and thus placing unrealistic stress on another. In other words, preproduction chips are sort of like beta software--full of DEBUG code which slows everything down, but isn't worth taking out until you're sure everything works.

    "But," you may say, "why would Intel sandbag their preproduction P4's when they know benchmarks will leak out?? Why not build up the hype and all that??" The answer, again, is simple. If you take a look at Intel's history of dealing with prerelease cores, you find that they only hype the projects which are likely to underperform horribly--the i860, the iAPX432, Itanium--and they significantly underplay the ones which are going to kick major booty--eg. the P6 core and now the P4. "But why???" Easy. If Intel has a project which sucks, the best they can hope for is to scare off their potential competitors from the market space until they can get another crack at it. (Remember, there's a 3-or-more year lag-time between the decision to start--or not start--a project and the finished product.) That's exactly what they've done with Itanium, scaring MIPS out of the high-end RISC business, and putting Compaq and HP years behind on their high-end RISC designs, with nothing but a bunch of IA-64 FUD. Meanwhile, if their upcoming core is going to perform incredibly, why waste time hyping and giving your competitors the tip-off?? All that would do is cannibalize the sales of your current MPUs as people wait to get the amazing new chip due out in 6 months. Worse, if Intel hyped the great performance of the upcoming P4, they would need to admit that the average PC user can actually use 1 GHz+ performance...which, of course, would play right into the hands of AMD which is the only player with decent 1GHz+ volume until well into next year. This way, you get to surprise the industry, get great press, and sell off way more of your old, now obsolete chips. Simple, really.

    Now, the P4 barely keeps up with the current-generation Athlon Thunderbirds. This is important to note because people always *blamed* AMD for a processor which still, with the advantages of the P!!! SIMD intruction optimizations used in much software, didn't quite keep pace with Intel's offering in the most common benchmarks. Now, the technically knowledgeable know that the Athlon whomps the P!!! in anything that isn't SIMDified, and that its floating point unit is head-and-shoulders above. But people still moaned about the performance gap in certain common SIMDified benchmarks.

    Wrong, wrong, wrong. The only cases in which the Athlon clearly bests a Coppermine P3 is in scientific (i.e. double-precision) FPU-heavy simulations, ray tracing, etc. On almost every other benchmark, they are within +/-5% at identical clock speeds, with a few standouts at around +/-8% for each architecture. In particular, 3D games tend to show an affinity for the Coppermine. Blaming this on some "SIMD bogeyman" is ridiculous--every 3D game, and especially a standout game like Quake 3, is optimized for 3DNow just as it is for SSE. Now, you can either deny the facts, or you can try to understand them.

    The main culprit, of course, is the difference in L2 latencies. Tbird has a 64-bit bus to L2 at a latency of 11 clock cycles, with 384Kb total cache; Coppermine has a 256-bit bus to L2 at a latency of 7 clock cycles with 256Kb total cache. The Tbird has the bigger cache because the cache design is exclusive; however, it also has much longer latencies for this and other reasons. In the end, there is no comparison as to which is the better design--the Coppermine's cache hierarchy is simply better than the TBird's, no argument about it. And Johan's benchmarks illustrate this rather nicely.

    Well, here's what they didn't realize: the Athlon is a truly seventh-generation core--which beat Intel to the punch by, what, almost a year and a half? As such, it has made trade-offs to be able to scale to higher clockspeeds better--one reason why Intel had to recall, and still hasn't re-issued, the 1.13GHz P!!! yet AMD are easily churning out 1.2GHz Athlon Thunderbirds.

    "The Athlon is truly a seventh-generation core." What does that mean??? If you think it means the K7 core has one single architectural innovation which does not exist on an MPU available before it, then I challenge you to list it now. (Indeed, I can't think of a single innovation in the K7 which isn't in the P6 core--except for the exclusive cache architecture, which is an overall weakness compared to the Coppermine cache--but there may be some.) If you think it means the K7 is a better core than the P6, well, you're right. The K7 is indeed a better core, in that its pipeline stages are more evenly balanced, and thus it can scale to higher clockspeeds on similar process. On the other hand, the K7 is less well balanced from an execution resources standpoint, including such oafish features as a fully 3-wide FPU (as opposed to the P6's 1.5-wide FPU), which offers at best 40% better performance, but generally no better performance than the P6 on FP intensive apps. Yes, the reason for the discrepancy is partly due to code which is compiled with the P6's execution resources in mind--but of course, that will continue to be most things so long as Intel has the majority of market share (AMD currently sells out all the MPUs it can make and thus has no theoretical way of getting majority market share for at least the next 4 years or so), and most apps are precompiled binary. But it's partly due to the fact that there's just not enough need for 3 full FPUs to justify the die space they take. This is just one example, but the end result is that the K7 is a well-balanced core pipeline-wise which is larger and consumes more power than it can justify based on its ability to get instructions from cache and memory. It is still the fastest thing out there, but it uses brute force to make it there. Time-to-market issues are behind some of these design issues, and some of those will be solved with the upcoming Mustang/Palomino/Morgan core tweak. But that still won't make the K7 anything more than a rebalanced tweaked-out brute-force of a P6. And hey--that ain't bad. But it ain't innovation.

    The P4, on the other hand, includes many features never before seen on a commercial MPU. They include: double-pumped ALU, integer decoder and scheduler, and integer retiring (running at up to 4 GHz on a .18 process!!!); trace cache; two-cycle L1 potentially using way-prediction to reach 2.0 GHz on a .18 process; hardware prefetch; and, well, a pipeline deep enough to allow 2.0 GHz on a .18 process. It also includes some impressive resources never before seen on the x86 side of things. They include: 126 op buffer; 3.2 GB/s-4.27 Gb/s FSB; "most accurate branch prediction algorithm ever" (claimed by Intel at MPF a couple weeks ago); 48 GB/s L2->core bandwidth; and SSE2, which will finally let the x86 push double-precision FP code with the big boys, and doesn't resort to a kludgy, die-space-wasting, gas-guzzling halfway-solution like the K7's triple FPU. On the downside there is the branch misprediction penalty of 19 clocks, potentially 27 if the code is not in the trace cache (unlikely). However, even this is mitigated by the fact that while the official branch mispredict penalty of the P6, for example, is a mere 12 clocks IIRC, the actual time to execute new code on a mispredict is more in the neighborhood of 30-50 clocks, because the instructions need to be rescheduled. Meanwhile, the P4 has wider scheduling resources, and thus may not even have a higher branch mispredict penalty in practice at all. It will certainly have many fewer mispredicts, so the overall analysis here is probably a wash.

    It is, all-in-all, a very impressive looking chip, more than worthy of the title "seventh generation", whether it turns out to perform well or poorly. However, meaningless sandbagged benchmarks aside, all indications are that it will perform magnificantly. Taken as a whole, the P4 contains not only the sorts of design changes necessary to *double* clock speed on a given process over the P6 (note:WOW), but also *increase* IPC. But we'll see how this beautiful looking design translates to reality when the first actual P4's are released and benchmarked.

    Blah blah blah, biased statements towards Ace's.

    Ace's is in general a slightly AMD-biased site. "Unfortunately", Johan, Brian, and the rest of the crew there "have to" read the thoughts of actual MPU experts day in and day out in their technical forum, and thus know that the case for the K7--and against the P4--is not what the average hardware site has made it out to be. This is not to take anything away from AMD, which has at the moment by far and away the fastest performing MPUs on the planet, the best binsplits on the planet, and about 1.4x the performance/price of Intel all the way up and down their price lists. However, all appearances are that, once the P4 moves into heavy volume production (note: not until Q3 next year at the earliest, after a process shrink to .13 Cu), Intel will have a very strong and competitive lineup. And that until then, while AMD ought to be the choice of every sane computer buyer around, Intel will have bragging rights for the highest-performing (not just highest-clocking) chip in the x86 space, if not in the world. Furthermore, with the K8 almost certain to be just a derivative of the K7 (probably with 64-bit extensions and 2-way CMP), it looks as if Intel will take back the clock-speed crown and hold it for good. Whether that means it will win the performance crown for good remains to be seen, but I certainly wouldn't discount the P4 core if I were you.
  • This is kind of offtopic, but I'm just mentioning this because there is always bickering over which is best, Intel or AMD. Short answer, it doesn't $#$#! matter, buy what you want. What some of you people seem to be ignoring is that Intel and AMD are pretty much on par right now. Sure, AMD has a 1.2GHz thunderbird out, however notice that these chips are only made in dresden at the fab that works with copper interconnects. Is Intel using copper interconnects in their pentium iii's? Nooo. Could that be a reason for their core not scaling any higher? Yeees. Next point. While the Pentium 4 may be a definete not improvement, there is very little basis to judge that upon right now. Oh, a few benchmarks. Who cares. Remember when the P3 450 came out? It was not faster than the P2 450. Does that mean that I want to use a P2 1000 Mhz in my system? Maybe, but there will never be such a thing for several reasons. The pentium 4 has definet potential. Who cares that it doesn't give me 125 fps in quake 3 as opposed to 115 fps with a pentium 3 or athlon. Game performance is reaching a ridiculous level anyways, who needs that many fps. There are other things that can be done with computers, maybe the pentium 4 is better at those, who knows, we're only benchmarking Quake 3 and such. So, just be happy that there is competition and that it keeps the prices down. Choose what you want.
  • What?? You don't see a trend there?

    Hell, either do I! And why is that, well it's because the Win95 you had on your P166 was quite shitty and the Win98SE you (and your father) are running on K6-2 is quite more reliable. Plus, DirectX and 3d and other software compatibility sucked back in the days of P166, and are much better now, explaining why your games work well now.

    verdict: your problems were caused by software, i.e. your post is off-topic. mega

  • Back when the K6 was still competitive, AMD people thought Tom had a strong anti-AMD bias. Later, AMD just about went bankrupt as the Celeron started dumping all over the K6.

    Back when 3dfx was still competitive, people though Tom had an anti-3dfx bias. Today, 3dfx is in a lot of trouble.

    Now Tom has an anti-Intel bias? Based on experience, I'd be worried for Intel, not for Tom....

    Bryan
  • This is not a technical discussion... if you ever thought that you would have an in-depth technical discussion on /. - you were dreaming.

    I stated an opinion - a valid on at that (but I am biased on this one) - that I prefer comparisons of raw CPU power (in whatever areas you like to look at) vs. cost...

    Now if you aren't a supporter of free expression... that's a different matter.
  • Rendering will always be a time consuming process. It won't get faster. Rendering a fixed format, fixed resolution, fixed framerate video will get faster of course... but as CPUs get faster, more CPU intensive algorithms will appear, to get better resolution, higher quality, more effects, etc. So upping CPU speed will gain you nothing, in the end.

    Wrong. Upping the CPU power will gain you better resolution, higher quality, more effects etc. You just said it.
    For example compare the two latest Titanic movies (Raise the Titanic produced at about 92 and the Leonardo DiCaprio one produced couple of years ago. Look at the special effects. Compare the ships. If increased CPU speed gains nothing, why doesn't the older movie have special effects like the newer one?

  • > Many websites send notices of their original content to each other

    That's fine, but /. is not Anandtech. This is not a site on which every little review and rant is meant to be posted. Hannibal's article belonged here because, well, Hannibal is an expert on the technology behind microprocessors. This article didn't tell us anything we couldn't have gotten from the standard press releases. It was, IMNSHO, fluff masquerading as a technical piece. It was full of unfounded opinion, not detailed and insightful analysis.

    > This is false. I am a hell of a lot more knowledgeable in matters
    > of MPU architecture than you, and I learned quite a bit

    Then you obviously aren't as knowledgeable as you claim, if you learned anything from that piece of fluff masquerading as a technical piece. And, how would you know how much I do or don't understand about processor cores? As I said, there was nothing in that article that anyone who follows processor technology wouldn't already know from previous, and more insightful, articles. You can call me a "troll" if you want, it just proves your immaturity and the FACT that you have no solid way to refute what I said, you just want to resort to ad hominem uselessness. And, just FYI, I have read every single article on microproccessor design that has passed by the /. pages for two years, plus linkage from several other sites, and a few print articles (though I no longer like to touch paper. How primitive...). I could easily look up links, hell, just by using the search features on /. and Anand. But that wouldn't prove a bloody thing, and I won't waste my time. If any one here is the troll, it's a bear-baiter like you.

    > The author's name is Johan De Gelas. He lives in the Netherlands. ENGLISH
    > IS NOT HIS NATIVE LANGUAGE. I'd like to see you post a single sentence in Danish

    I don't care where he lives or what is his native language. Unless he is fourteen years old as well, thee's no excuse for writing like that. I wasn't nitpicking grammar or syntax, I was talking about the very narrow, simplistic style--style is the same, whether in English or Latin or anything else. And this is written like a high school term paper, not a serious treatise on microprocessor design. My point was that it's a lot closer to a high school report than a serious article. To go even further, it has NO insights of its own. It quotes others, and says nothing unique. Therefore, it isn't a real analysis of anything. The author didn't analyze, he wrote a book report. Furthermore, you sound like an apologist who objects to what I said based on your personal biases, not based on the merits of my analysis of the article and the situation in general regarding the Athlon vs. Pentium 4 debate.

    Skipping the boring quoting, I'll state this: preproduction benchmarks are NOT "always wrong." You're an idiot if you think that anything is "always wrong." Even a blunt intellect such as yourself, who prefers lashing out based on general statements and a few chosen "wrong" benchmarks of pre-production systems (when there are plenty of on'par looks at performance, too), gets a few things right. But this isn't one of them--you think Tom was wrong when he said that when the PII came out if offered little over the PPro. Well, its advantage was more horsepower, but its disadvantage is no on-die high-speed L2 cache. Therefore, if you multitask instead of running one app at a time, the PPro was usually considerably faster for you than the slower-cache PII--when it was first released, that is. As its clockspeed eclipsed that of the PPro significantly, of course its performance eclipsed the PPro's. But, I'd take a 200MHz PPro with one of the larger L2 sizes available on it over a PII300 any day, if I were using it for serious business-type uses rather than gaming. Especially if it were a server, the PPro would kick the PII's ass because of the cache. We can conclude this by saying that you believe the P4 becnhmarks so far have been "sandbagged," and are inaccurate. I believe they're accurate. There's no way to settle this until the retail product ships, period. But when it does, I expect a full apology. ;-) BTW, the previews have mostly been done by Intel-friendly websites, not by AMD campers like Tom. Me? I dislike Intel, for some of their MS-like tactics, but I buy the best performance I can get for the money. If I bought a system today, it would have to be Intel because AMD has no multiprocessor solution on the market yet--but I fully expect the 760MP to change my mind. Intel's GTL based SMP isn't even in the same league with what the EV6 can do.

    Again skipping the quoting, your analysis of Athlon vs. P!!! performance is pretty useless since I was bemoaning the fact that people have been comparing a 7th gen core designed to scale to high clock speeds with an old core which is optimized for lower clockspeeds. But, I'll debunk the FUD anyway. The Athlon core performs better all-around, not just on FPU-intensive apps. The reason the Intel chips score better on many apps, like games which, yes, are optimized for 3DNow! too, are two-fold. First of all, the Intel chips have MORE SIMD instrustions, so that optimized apps have more to work from. Secondly, and most importantly, Intel gives out much better compilers which optimize the code more tightly than it can be optimized for AMD processors. AMD doesn't do all this compiler work; they consentrate on making better processors, not on tieing in specialized compilers to their processors. "You can deny the facts, or you can try to understand them." And your "analysis" of the supposed advantages of the Coppermine's cache over the Thunderbird's are positively laughable. You see, in the REAL WORLD people don't run benchmarks on their boxes all day. They run apps and processes, usually several at a time. That's why the Athlon's cache is superior--you can keep more in it instead of swapping to system RAM, which is a MUCH BIGGER HIT than having a small amount of cache latency. In the REAL WORLD, the Athlon's cache architecture makes sense, not in your fantasy where we all run CPUmark all day.

    > "The Athlon is truly a seventh-generation core.' What does that mean???

    You really should pay more attention in class, boy, because I'm schooling you right now. I explained what I meant right after that comment in the original post: The Athlon is a new core designed to scale well to very high clockspeeds. Just like the Willammette. That's why the Willamette performs slower clock-for-clock than a theoreticl P!!! at the same clockspeed: it has to make trade-offs to achieve that speed, namely deeper pipelines, more of a hit for failed branch predictions, etc. And yet, people were wrongly downing the Athlon for not reaching the P!!!'s clock-for-clock performance, and that was a mistake since the Athlon, like Willammette, must make some sacrifices to reach high clockspeeds. Again, this is why you can buy a 1.2GHz Athlon today, but the 1GHz P!!! is the best Intel can do. Their 1.13 GHz part, which had to be recalled, was malfunctioning so badly that people couldn't even get the Linux kernel to compile on it. Blech.

    > "If you think it means the K7 core has one single architectural innovation
    > which does not exist on an MPU available before it, then I challenge you to
    > list it now"

    It's a huge innovation in the x86 world, something Intel hasn't done since the PPro days. First of all, the EV6 bus is new to x86 and a huge innovation, it is superior in every way to the old Intel GTL+. Not only does is run at an effective 200MHz--and did so back when P!!!s were still at 100MHz FSB, and the Celerons still are crippled at 66MHz while the Durons are at full 200MHz like the Athlons--but it is superior technology. It's not a shared bus architecture like the P!!!s, each processor gets its own interface and its own dedicated bandwidth--a huge boon for multiprocessing, with the 760MP coming to retail very soon according to Anand. It's revolutionary for an x86 processor. Plus, you wouldn't consider Athlon's monstrous brute of an FPU an innovation? Poor FPU performance was always the bane of the x86 architecture--a primary reason x86 wasn't used in most supercomputers and scientific applications. Now, the countryside is littered with Athlon clusters crunching numbers for the scientific community in places where they'd never have considered using a P!!!. That's an innovation, too. You claim that the K7 is merely a revamp of the old P6 core, but that only proves your total ignorance about its design. It's a whole new core, with very little similarity to a P6 core beyond the x86 instruction set. If you don't know that, you're out of your depth here. Go back and play on the porch at HardwareCentral, where they're as biased against AMD as Tom's is against Intel.

    > all appearances are that, once the P4 moves into heavy volume production
    > (note: not until Q3 next year at the earliest)

    God, you are a shameless, and dim-witted, Intel apologist, just as I suspected. Q3 2001? And then you go on to say that the K8 will just be a derivative of the K7. You really aren't paying attention at all. The K8 Hammer architecture is completely new, not only extending x86 to true 64-bit while retaining backwards-compatibility with 32-bit and 16-bit code, but adding huge and significant architectural innovations. Go read about it, dimwit, before you guess at what it is. Lots of documentation has been released--even just a quick scan of some Slashdot search results will make you a lot more knowledgeable about it than you are now. Geez...

  • As in, what? Windows apps you mean?


    Exactly.

    Unless you can convince the other 80% of the population to switch over...
  • Great comment! All we need now is an upper case digit '1', and we're all set. ;^)
  • Wow! A lamer! I must have hit the big time!

    WOHOO!

    =)

    Regardless, would you care to argue my points?

    I must ask myself: Are you here to waste my time? Are you pissed because you are unable to argue my logic (which would be sad indeed,) and instead choose to post pointless remarks?

    Perhaps it's just the grape...
  • I think that a better comparison would be ppclinux and x86 linux running apache. Don't create an artifical deficiency for ppc when there clearly isn't a need for one.. Then again it would probably come down to the i/o capabilities. Maybe even better would be something like rc64 or seti@home, etc...
  • Not true. I was actually running the same windows 95 on my 166 that i ran on my k6-2 for a while. It's wholly on topic because I was talking about AMD vs Intel.

    And who made you the on topic police, eh? ;)

  • I homebuilt a dual processor 400mhz PII a year ago in August for a personal programming/development machine. I can say I should have saved the money on the extra processor and gotten a nice video card instead.

    You see, if you run Windows (except NT and 2000) you can only use one processor, and since most games run on Win95/98 you are stuck using one processor. Linux games support is dismal, plus there are no games in the $10 bin at the BestBuy checkout yet.

    On the other hand, Make bzImage is lightning fast!
  • by Anonymous Coward
    Originally, I was going to post a Karma Whoring reply to your comment. But then I realised : I've seen this type of comment 1000000 times before. It comes up every single time a new processor is even mentioned on /.

    Don't you want a Star Trek style computer? I do.Every single time there is a processor speed increase, Someone thinks of a way to use it up. I want natural speech recognition, a 3D desktop, whatever. Processing speeds probably aren't really necessary anymore ; in that respect I think you are right. But they sure are fun.

  • by Anonymous Coward
    Manufacturing is also a very important piece of the puzzle. The best CPU design ever is not worth much unless it can be manufactured in vast error free quantities in a cost effective manner. How different would the market be if Intel were able to deliver mass quantities of 1Gig PIIIs?
  • It's not just games. With the faster processors, programmers get lazyer. It's the "If I don't need to save the cycles, I won't" style. What I feel we need to do is have all the programmers start out on slower processors, where a few clock cycles could mean a lot. In my experance, if one writes a game and it is bloated and you have to go back and save cycles to make it playable, one will not bloat the code so much the next time.
    What do you people say?
  • Unfortunately, they probably can't -- thanks to patents.
  • being that SOMEBODY always seems to be complaining about x86, and how (insert other processor here) is *so* much better would it be a good idea for intel and AMD to make chips with two different instruction sets and then gradually phase out x86 once the newer instruction set becomes mainstream? it seems like a good idea in that people wouldn't have to immediately abandon x86, but would have quite a while to replace their software with the newer versions, and after a while, the new instruction set would become standard, and not too many people would notice (windows users, do you still use 3.11 and it's software? linux users: you know that you could port linux to any platform...)

  • >We have published an in depth comparison...

    I wouldn't trust a supposed in depth comparison by someone who is obviously ill qualified enough to make this statement:

    >Some of the questions answered: Are double
    >pumped, hyperpipelined, low latency designs
    >the only future for x86?

    Low latency & long pipelines in processor design are opposed. You can't have both at the same time.

    Example:
    G4s have 4 pipeline stages = low latency, low clock speed
    PIIIs have 14 pipeline stages = high latency, high clock speed

    Hyperpipelines are generally took as those of ~20 stages. There are some big disadvantages to having pipelines that deep. High clock speed means nothing if the processor is wasting cycles emptying a super long pipeline after a branch mispredict.
  • Well if the P4 is only for marketing speed, and not real performance then all AMD has to do is go back to P-rating their processors. So a 1.2 GHz Athlon is now labeled a 1.5 PGHz Athlon. That would let them keep up.
  • I think what I was intending was that algorithms will eat all available CPU processes/resources, so that as CPUs double in speed, computation algorithms will also double in complexity, meaning a 1 minute video will take the same amount of time/memory/hardware price, regardless of how fast the CPU gets.

    The nick is a joke! Really!
  • And that until then, while AMD ought to be the choice of every sane computer buyer around, Intel will have bragging rights for the highest-performing (not just highest-clocking) chip in the x86 space, if not in the world.

    I suppose you mean both 1.133Ghz processors that shipped before they were recalled?

    Flamebait, I know, but really.. 1.2 Athlon tops 1.0 P3 in pretty much everything...

    Kjella
  • I noticed!
  • Motherboards with the 760 chipset are slated for January 2001.. dual Mustangs whooopeee :) .... I cant wait !!!
  • Its really simple..you wait till January and buy dual socket A motherboard.. and buy any socket A processors you want. Im going to get dual Mustangs as they also will be out by January. but you could get some Thunderbirds cheap buy then im guessing you could pay about $150 for a 1GHz T-Bird in January. Mustangs are available starting Nov. 1st, but even in January they will be alot more. In January a 1.5GHz Mustang may go for about $450-$650.
  • On the other hand, Make bzImage is lightning fas

    If you think that's fast, try `make -j3 bzImage` instead.

    --
  • How can you get _maximum_ power for _minimum_ price? Does not make much sense, does it?
  • Is the SPARC chip, baby. UltraSPARC III now, IV and V on the way.
  • I think you misinterpreted what I meant by "rendering". I don't mean generating an image out of nothingness (e.g. Toy Story). In this context, most current video processing programs (Apple's Final Cut Pro, Adobe Premier, etc) need to "render" in order to create video effects, such as transitions (combining two source images in some way, such as a fade). The programs don't seem able to do these in real-time (though there is some third-party hardware to allow for some real-time rendering of some effects). The number I tossed out (5GHz) is what I assume will be required for simple real-time effects for DV resolution (720x480, 30 FPS) output without extra hardware. I'm making the assumption, of course, that the software is near-optimial (I'd bet it is probably within a factor of two of that, but I really have no idea).

    As for rendering it low-quality, that's just a work-around for the current computer limitations. When editing, one really wants to view the output as it's going to be in the final. Degrading the quality may result it different decisions on the edits (what "looks good" may change). The other problem is that these programs cache the rendered images, which can effectively double the storage required (which is not insignificant!)

    The bottom line is that real-time, high quality editing would make a difference and for this to happen, CPUs need to get significantly faster. Note, however, that video is (or should be) highly parallelizable. Right now neither Final Cut Pro nor Adobe Premier uses my second processor, but they really would benefit tremendously.
  • by Anonymous Coward on Monday October 23, 2000 @10:00AM (#682508)
    I am more interested in the differences between the PowerPC and x86 manufacturers than the intra-x86 manufacturer fighting. I think Motorola could learn a lot from the recent design trends of AMD and Intel, about ways of pushing the megahertz envelope, while Intel and AMD could learn a lot about energy efficiency and overheating from the PowerPC camp. Transmeta notwithstanding.
  • John C. Dvorak is a frequent columnist for various ziff-davis publications, recently purchased by c|net, which is wholey owned by Intel. Why the hell can't there be any news that is unbiased? What happened to honorable journalism? /. crew, who owns you? If /. is not just another corperate media puppet then for god sakes act like it!
  • by ToLu the Happy Furby ( 63586 ) on Monday October 23, 2000 @01:17PM (#682510)
    When the P6 was released, it was the fastest processor available in industry standard benchmarks (SPEC, including Alpha). Its design was highly original, and manages to keep the CISC nastiness contained to the first few stages of the pipe. Claiming that the P6 was not a world-class design when released is only a testament to your own ignorance.

    Exactly correct. If I had moderator points, they'd be yours.

    And indeed, the 1 GHz P3--on that same, 5 year old P6 core--is still tied with the moderately-vaunted brand-new mucho-expensive (not available until Q1) 900 MHz UltraSparcIII in SPECint2000. The 1.2 GHz Athlon would presumably perform even better (once they release SPEC scores from the new Compaq Fortran compilers), making it second only to the fastest (and also none-too-available) Alphas in terms of pure performance. The x86 ISA may be suboptimal, but Intel and now AMD have been able to keep up with the best--and most expensive--of the RISC world due to superior engineering (except when compared to the excellent Alpha team) and superior process technology. Sure they may not have the i/o bandwidth, RAS, or operating systems to compete in the big leagues, but anyone dissing today's x86 chips on account of their designs or engineering qualities is, as the poster said, demonstrating their ignorance.

    And if Compaq doesn't hurry the EV68 (die-shrunk Alpha) to market, the P4 and perhaps Mustang as well will blow by even the mighty Alpha, in SPECint and possibly even SPECfp. (The last real knock against the x86 ISA is that it is saddled with the horrendous x87 fp architecture, which is why x86 SPECfp scores trail everyone else by so much. With the P4's upcoming SSE2 instructions, however, that problem may be in the past.) Aesthetics aside, there is no doubt that x86 processors, taken as a whole, are easily the best designed, highest performing MPU's around.
  • er... sorry to burst your bubble, but AMD doesen't name their chips after slow cars, they name them after WWII planes. Didnt the fact that their next chip is named Spitfire kind of give that away?
  • The recent Linux Journal articles on the topic of building your own PC, plus my "need" for a new desktop have made me kind of interested in building my next PC from scratch (possibly as a dual processor). However, looking at all the information and sites devoted to dissecting CPUs and motherboards have left me feeling...inadequate. Is there a site that lays out, for a Certified Hardware Moron(tm) like me, what terms like "Socket 7" and "Slot A" mean, what the relative merits are, how to pick (or even identify) processors that can run in a dual configuration, what all the hubbub about RAM is, etc?
    --
    An abstained vote is a vote for Bush and Gore.
  • I plan to publish an in-depth comparison between Intel, AMD and Transmeta's policies on giving free samples to jornalists...

    Michael

    ...another comment from Michael Tandy.

  • When the P6 was released, it was the fastest processor available in industry standard benchmarks (SPEC, including Alpha)

    Woo boy... hold on there. First of, the "P6" is not a processor, but a processor core. It was also the code name for the CPU later to be marketed as the "Pentium Pro". The Pentium Pro was an impressive chip, and for some types of operations, it was increadibly fast. However, it did not beat everything else out there with SPEC benchmarks.

    For starters, SPEC is not a single benchmark, rather a consortium that comes up with benchmarks, the most well recognized being their CPU benchmarks (colloquially refered to as SPEC benchmarks). These benchmarks however, do not exclusively test a CPU, but rather a system as a whole, although they are designed to make the CPU the limiting factor (nonetheless using bucket loads of RAM, fast disk controllers, and a huge external memory cache can have wonderful impacts on SPEC benchmarks). Typically these benchmarks have been divided into those that stress the integer unit (SpecInt) and those that stress the floating point unit (SpecFP). The Pentium Pro was the first x86 CPU to post respectable SpecFP benchmarks, but it still got it's butt kicked all over the place compared to it's RISC competition.

    Even on the SpecInt benchmark, the earliest PPro benchmarks [spec.org] I could find on Spec's website show that while the PPro put in some respectible numbers, it was far from being the king of the SpecInt benchmark.

    The PPro was a breakthrough in terms of it's price/performance. This was largely due to economies of scale rather than design genius.

    Despite all that, I think the PPro design was very impressive. It was probably the strongest evidence at the time that CISC could hold it's own against RISC competition, something the pundits had been suggesting wasn't going to happen.

  • Was the final question of the post rhetorical? He askes, Will future designs from AMD and other competitors be similar to Intel's innovative seventh generation core? I would think this would be a no-brainer...ABSOLUTELY, POSITIVELY, YES.

    1. My Vote's On This Doofus [mikegallay.com]
  • Under Linux, C Compile times and shell performance are complete adequate on a Pentium 133. Everything else on top of that has been pure gravy. It is sooo ironic that the insane demand for powerful hardware that Windows has is making my Linux experience extremely nice. Windows uses up all those cycles trying to make things friendly with a GUI, and the effect of that is to make fast hardware very cheap. If it runs Windows, it runs Linux faster.
  • Amen to that!

    Anyone else here consider a 600MHz Celeron a screamer?

  • Games! (Oops, sorry). The other thing people need CPU power for is video. Video requires far more CPU power than is now available cheaply. And Consumers want it, too (ask anyone with a kid...) Rendering on current processors takes just too damn long (I don't care if you have your 500MHz G4 or 1.3MHz Pentium) at DV resoultion. There's still a lot of waiting... We really need something more like 5GHz to have anything reasonably responsive for "normal" editing (i.e. without crazy transitions, etc). And of course, then people will want to start adding fancy video effects...

    I agree that for word processing/office type tasks, no one really needs more than about 200Mhz. More than that is a waste.
  • As in, what? Windows apps you mean?

    I have plenty of UNIX/POSIX apps that work fine from one platform to another...

  • wwwwww,what !! still need windows and office!! ok Crack-Baby!!!! I've been MS free for 3+ years! new computers with an OS are the biggest ripoff going !!!
  • As long as our favourite non-free x86-only OS rules the planet, we'll have x86.

    There's more to it than that. If we're all running free OSs, but need some proprietary binary-only drivers or applications on top of that OS, we're still locked into one CPU architechture.

    I am not a give-me-Free-Software-or-Death kind of guy, and regularly run proprietary software on top of a free OS, but I think we have to be aware that this continues to lock us into one architechture. Attempts to viably market proprietary software on multiple machine architechtures (e.g., apps for NT on Intel+Alpha+MIPS, or WordPerfect on various RISC Unices) have generally failed due to support costs and market size.

  • t-bird, mustang, corvette, spitfire, are all WWII fighter planes.
  • Those aren't "PPC" they are 64-bit POWER processors that are mostly binary compatable with the PPC's that motorola ships as embedded processors that Apple sticks in desktop boxes. Yah, those RS64 III's are nasty processors but as you pointed out the cache still isn't on die. They also consume 5 to 6 times the amount of power that the G4 does. By the time you have added 8 megs of fast SRAM power consumption to the total you are well into the range of your average high end processor.
  • Intel is so intent on winning the "Megahertz race" because they are the ones who created it. They have spent millions of dollars beating into the head of every mainstream comsumer the pseudo fact that "MHz measures how fast your computer runs".
    Most people reading this probably get general computer questions from friends/family/non-IT co-workers on a daily basis. How often have you asked someone what kind of computer they have and gotten the response "400 Megahertz Intel"? You ask what brand it is and even list off the big ones and they respond with a blank look. That blank look you got warms the heart of Intel's ad guys. This person can't remember the name of the company that has their name plastered across the box the computer came in, but they know the manufacturer of the CPU inside the computer, that at best has a little silver decal on the outside.
    On a sidenote: I think that the Intel manufactured "Megahertz race" hasn't hurt AMD nearly as bad as it has hurt Apple, who in many ways have superior CPUs, but at considerably lower clock speeds.

    I won't even get into "How much memory do you have?"...."20 gigs".

    -B

  • by barleyguy ( 64202 ) on Monday October 23, 2000 @01:31PM (#682525)
    Actually, when the 760MP chipset comes out from AMD, you'll be able to use 2 different speed processors on the same board.

    It's point-to-point multiprocessing, instead of symmetrical. You can, for example, buy a 760MP with a 1Ghz CPU now, and put on a 1.2Ghz as the second processor later. And each chip has it's own Northbridge and path to ram, as opposed to the shared GTL bus on an Intel.

    They FINALLY demonstrated the prototypes, so the real boards should be out Real Soon Now.
  • The Microprocesor Report summarised the CPI vs MHz approaches to improved CPU performance not so long ago.

    Their studied and authoritative conclusion was that process improvements over the last decade have allowed clockspeeds to drive performance up faster than any CPI improvements like superscalar designs or speculative execution.

    There is no denying marketecture is the major driver for consumer computers, peecees and macintrashes, but there is a a real technical argument for it too.

    An exmaple: IBM POWER designs have recently made a strong shift from a strong CPI focus to a very strong clockspeed focus (POWER 3 around 300 Mhz, POWER 4 around 2GHz), and they generally sell to a technically savvy market that's not so subject to the lure of marketecture.

  • Tell me, what is so innovative about _kludging_ old crap to high speeds and adding more superfluous, useless instructions? I say nothing. Innovation is by definition "something new or different".

    That's like looking at a new race car and saying it's no better than a Model T because it's just "kludging old crap to high speeds". Now, if it had 5 wheels, then you'd be getting somewhere! That's innovation!!

    Or like saying any new computer is still the same old crap kludged to high speed because it's all binary. If only we switched to ternary logic, that'd be innovation!! Or because all architectures these days use 8-bit bytes. What's wrong with 10, or 37? The only reason we use 8-bit bytes is for backwards-compatability with extended ASCII, which is obsolete anyways! Why not move to 16-bit bytes with Unicode?! That'd be innovation! That's new!

    Geez. It's not like the architects at Intel (or AMD, or Compaq, or anywhere for that matter) couldn't think up a new ISA in their sleep. It's just a dumb idea.

    The Merced might've been an impressive chip, but why the f*sck do the still have keep dragging that 8086 shit behind?

    Because the point of a MPU is to run programs, not to look beautiful on paper. The vast majority of the world's programs run only on x86; thus the vast majority of processor marketshare is going to be x86-compatible. Besides, if you actually knew and understood the amazing ways Intel (especially) engineers have managed to squeeze out performance despite working around the design constraints of x86-compatability, you wouldn't run around calling an amazing core like the P6 a "kludge". Indeed, you obviously don't realize this, but the most important ingredient in getting a chip to high clock speeds is an elegant balanced design. (Manufacturing process is a close 2nd, and is why the Alpha, a more elegant design, is outclocked by the PIII and Athlon.)

    I think we should go more into parallelism in programs and take the advantage of multiple, perhaps a bit slower, processors, not one huge frying pan.

    Well that's wonderful that you think that. Unfortunately, instruction-level parallelism is very very difficult to extract from most computer code. Furthermore, the increased complexity of SMP buses and SMP motherboards makes the SMP option too expensive to hit the mainstream market for the forseeable future.

    Instead what we'll see is CMP--chip level multiprocessing, essentially having multiple chips on one die. Examples include the IBM POWER4 and prolly the AMD K8. This increases chip-to-chip bandwidth and gets rid of the motherboard costs, but doesn't solve the problem of low ILP in most code. A more interesting solution is SMT, simultaneous multithreading, which allows a single superscalar core to work on instructions from several different threads in parallel. Early indications are that SMT provides phenomenal performance boosts; Sun's embedded MAJC chips use SMT, but the first general purpose SMT MPU might be the Compaq Alpha EV8.
  • by Sir_Winston ( 107378 ) on Monday October 23, 2000 @01:36PM (#682528)
    Rather, it is a piece of self-promotion by Ace's Hardware, who sent this story in themselves. The article itself doesn't say anything the knowledgeable don't already know. In fact, it reads like a high-school report, and not even a very well-written one. E.g., "First we will try to analyze the most important shortcomings, next we will search for possible solutions." Sounds just like the simplistic expositions of a high school term paper.

    I repeat: the article is not a technical piece at all. Hannibal at ArsTechnica writes technical pieces about CPU design. This article at Ace's Hardware says nothing insightful.

    In fact, it misses the point. It dares to call the P4 "innovative" and wonder whether future designs in the x86 world will copy it. Well, of course not! How many times must it be said that the P4 barely keeps up with the Athlon and performs less well than a P!!!? Because, that is a fact. Numerous production samples have leaked, with the test results uniformly and without exception pointing to the fact that even if the platform's performance is improved by release time--which it should, since these are samples not a retail product--it won't outperform a P!!! with equal clockspeed. That's why the P4 is being released at 1.4 and 1.5GHz initially, because if they were released at 1.2GHz they'd be outperformed by the 1GHz P!!! and that wouldn't be good.

    Now, the P4 barely keeps up with the current-generation Athlon Thunderbirds. This is important to note because people always *blamed* AMD for a processor which still, with the advantages of the P!!! SIMD intruction optimizations used in much software, didn't quite keep pace with Intel's offering in the most common benchmarks. Now, the technically knowledgeable know that the Athlon whomps the P!!! in anything that isn't SIMDified, and that its floating point unit is head-and-shoulders above. But people still moaned about the performance gap in certain common SIMDified benchmarks.

    Well, here's what they didn't realize: the Athlon is a truly seventh-generation core--which beat Intel to the punch by, what, almost a year and a half? As such, it has made trade-offs to be able to scale to higher clockspeeds better--one reason why Intel had to recall, and still hasn't re-issued, the 1.13GHz P!!! yet AMD are easily churning out 1.2GHz Athlon Thunderbirds. The P!!! only scales well up to 1GHz--even then, it needed a microcode update to be stable--while the Athlon Mustang has hit 1.2 GHZ with no problems. Heck, Duron 600's usually overclock to at least 900MHz.

    In other words, you can't reasonably compare a core optimized to scale to low clockspeeds and take advantage of them, to a core designed to scale up to extreme speeds. You have to compare the Athlon Thunderbird core to Intel's own belated seventh-generation x86, the P4. And, the Athlon Thunderbird compares very favorably. It hasn't been released at 1.4GHz, and probably won't be since AMD will undoubtedly release the newer core before then, but an extrapolated 1.4GHz Athlon Thunderbird, in line with how performance scales for the that core, beats the 1.4GHz P4 samples that have been tested. THE ATHLON BEATS IT. So, how can you call such a low-performing core innovative? It isn't. I'd wager that the next core AMD have up their sleeves will be the real innovator here. Plus, to get the performance it does, Intel's P4 even has to use a 400MHz-effective FSB and double-pumped ALU. This makes the P4 core iteself look rather weak in comparison with the Athlon, which gets by with similar performance with merely a 200MHz (soon, 233) FSB and a non-double-pumped ALU. So, the core of the Athlon is clearly, in itself, much stronger than that of the P4. AMD will doubtless be using similar tricks in its future revisions, but it cannot be doubted that the P4 is not the "innovation" that this BS article claims it is. The article even belittles Athlon's branch prediction--which is weak, because the core was rushed--not noting the fact that even with such a poor branch prediction mechanism the Athlon core outperforms the P4 on a theoreticl clock-for-clock basis.

    I note the "theoretical" because I'd like to again point out that the Athlon core is soon to be released in a new revision which will scale to higher clockspeeds, have larger cache, and have improvements to the core itself which AMD has not yet specified. I think that this article at Ace's Hardware is so utterly biased against AMD and for Intel that it makes me sick. He talks of everything negative about the Athlon as being a "compromise" or a decision made in a rush, yet he plays down the negative aspects of the P4 core--for example, he plays down the 19-cycle branch misprediction penalty in the P4 by hyping the P4's escellent branch prediction algorithms, but doesn't give the Athlon slack about its lackluster branch prediction mechanism based on the fact that it has a reduced misprediction penalty. Ace's Hardware has always been biased for Intel and against AMD, and it shows here. The P5 core is hyped as a big "innovation," but not once is that word used in reference to the Athlon, which performs at least as well (probably better clock-for-clock, as I pointed out) and got there to the seventh generation almost A YEAR AND A HALF before Intel. The one place where he FINALLY gives AMD credit is in the conclusion, and even then it's marred by renewed complaints. This is funny, since this article was allegedly a follow-up to Ace's earlier look at the P4 core by looking at the Athlon core in that light. For all the nice things finally said about the Athlon in the last paragraph, he never once used "innovative" regarding it, despite giving the moniker to the P4 at least twice.

    And, as a final note, what I've just said doesn't really matter all that much, because the above poster was RIGHT: all that matters is who can deliver the most PERFORMANCE at the least PRICE. And that is, clearly, AMD. That comment *is* insightful, as far as it goes, because that's all that really matters. Why don't we all use Alphas or PowerPCs, which are much more beautiful architecturally? Because they can't give us the price/performance of an Athlon or dual P!!! system. In the final analysis, that's all that's important.

  • The new IBM POWER4 chip (a great article on the chip is here [realworldtech.com]) looks really great but given the trouble that other manufacturers have had with complex chip designs I have to wonder.

    - How well will it scale to high clock speeds?
    - How are chip yields going to be affected by the super-complex chip design (and of course, low chip yields = high price)

    Still looks good though, too bad it'll be about 50 years before I can afford to have one on my desktop.

  • As someone who programs. I disagree completely. One thing that you have to keep in mind, is that modern software is enormously complex. To say that a modern programmer is lazy, is to misunderstand what it is that they do. Think about this, the software that I program on for a living, takes up close to a million lines of code, has it's own windowing API, as well as cross platform stdio functions that replace C's functions, and many, many tools. So, how do you go about programming something like that in assembly? Remember, you need portability, so assembly would be stupid choice for something so large and complex. It's nice to think that we could all go back and write our stuff in assembly, but I think that when you start thinking like that, you're overlooking that simple programs today are so large and complex, and do so much, that they would easily overflow the memory banks of 'advanced' machines of two decades ago. That's why programmers used to be able to get away with assembly, the stuff they were doing wasn't nearly as complex, and they didn't have to worry about portability as much.
  • Or, tell you what, let's have a POV-ray race. You can have a gazillion P200s, and I'll have a gazillion Alphas at 750 MHz. Now, who do you think is going to win? That's what I thought.

    Do-nothing techno weenies always use the same goofy examples. You forgot "What if you need to solve systems of equations with 10,000 unknowns?" MPEG2 decoding is something that's offloaded to good video cards, and has been for years. Ray tracing is something done by, oh, 0.001% of all computer owners.

    In general, people don't understand how fast, say, 400 MHz really is. You see otherwise intelligent people thinking that a 200 MHz machine would drag if it had to search a database of 10,000 items. Heck, that would be like lightning on a 4MHz Z80!
  • t-bird, mustang, corvette, spitfire, are all WWII fighter planes.

    Right.

    But Allied fighter planes were used to fight Germany in World War 2. And AMD wants to sell processors to Germans. Plus AMD's best fab is in Dresden, Germany, which was destroyed during the war by a horrible firebombing. AMD would like to be especially nice to Germans, becaues they're important as both potential consumers and potential employees. So AMD decided to use the same codenames, but pretend they referred to cars instead of planes. (Also, AMD's chipset partner Via renamed the KZ133 chipset to the KT133, because KZ has nasty concentration-camp connotations.)

    This worked pretty well, because car companies love to steal airplane names. But AMD took it too far, and named their next-generation Athlon the Corvette, and the budget version of the Corvette the Camaro. The Corvette is a boat, but the combination of Corvette and Camaro is too obvious to be a coincidence. GM took offense, so AMD had to rename things again to avoid trademarks. (Never mind that trademarks are only valid within one industry; Duron is both an outdoor paint and a processor. It's not wise to mess with companies that can afford legions of lawyers over trivial stuff like codenames.)

    So AMD decided to use horse names instead of car names. The long-entrenched Mustang ("It's a fighter plane! No, it's a pony car! No, it's a pony!") name stuck. Corvette became Palomino and Camaro became Morgan (which is horse and also a car, but not a plane).

    Personally, I like Apple's way with codenames better. (Not quite enough to actually buy a product made by the innovators of the look-and-feel lawsuit, though.) Carl Sagan didn't like the Sagan codename and sued, so they called it the Butt-Head Astronomer. The Corvette should be renamed the Rattletrap and the Camaro should be renamed the POS.

    Clearly everyone should use those naming firms to pick unique, lawsuit-proof codenames that don't offend anyone anywhere. I think I'll start one of those companies right now. Here's a free sample: "fjso457lfdsjfl297."
  • thunderbird? mustang? let's just hope AMD doesn't make a Yugo, or a Gremlin processor.



    -
  • I say "what is the point?" Why learn how to sqeeze more out of processors that aren't used. I'm assuming we're talking PCs here. Would you rather a product be performance tuned and optimized for years just to sqeeze a few extra MB out your computer (all the time spent costs money which one would assume is passed on to the consumer). I'm not trying to say current software engineering practices are perfect (they're not). I just find it kind of funny when I hear people waxing nostalgic for their 64K of RAM and 10 khz processors. "Kids today just don't know how to program!"
  • by Anonymous Coward
    Attitude such as yours is the reason why such crap as the x86 still lives. I'd prefer a clean low-energy cpu, not a freaking frying pan with 30-year-old leftovers.
  • Actually, as soon as you get out of the dark ages of CLI's and try to run a real application like a desktop publishing system, graphics manipulation program, DVD playback, web browser, etc linux begins to look pretty shabby because of the extra overhead incurred by X11. I will take a faster machine any day of the week.
  • Intel is going the clock speed route for a reason, and it is pretty clear to me.

    Yep, It's pretty clear to me, too- Marketing. Intel has clearly decided that MHz sells, not real world performance. They clearly believe that the average buyer doesn't know enough to look at overall performance, particularly when there's a single, easy to follow number that supposedly measures speed. The sad part is that they're almost certainly correct. There are a lot of people who believe that MHz is the ultimate measure of a processor's goodness, so the hypothetical 2 GHz PIV will be obviously better than a 1.4 GHz AMD, even if the actual performance of the AMD chip is higher.

  • US

    We get better processors that are faster and faster. I don't care if AMD or INTEL has the faster processor, just that they keep getting faster in general. The changes in the core for the next generation of both of these guys are going to be a good thing. I just hope we don't get to the point where they splinter too bad that we need different software compiles for each one. As long as they stay compatible then we all win. If they don't then one or the other company is going to eventaully die out and I would have to put my money on INTEL walking over AMD just because of namesake and production capability.
  • Or, tell you what, let's have a POV-ray race. You can have a gazillion P200s, and I'll have a gazillion Alphas at 750 MHz. Now, who do you think is going to win? That's what I thought.

    Do-nothing techno weenies always use the same goofy examples. You forgot "What if you need to solve systems of equations with 10,000 unknowns?" MPEG2 decoding is something that's offloaded to good video cards, and has been for years. Ray tracing is something done by, oh, 0.001% of all computer owners.

    In general, people don't understand how fast, say, 400 MHz really is. You see otherwise intelligent people thinking that a 200 MHz machine would drag if it had to search a database of 10,000 items. Heck, that would be like lightning on a 4MHz Z80!
  • First off, apologies for slipping in ad hominem attacks in my post. However, this was just in response to your similarly inappropriate attacks on Johan and Ace's. The difference, of course, is that my comments were in support of the correct analysis, not disparaging it.

    This is not a site on which every little review and rant is meant to be posted. Hannibal's article belonged here because, well, Hannibal is an expert on the technology behind microprocessors.

    1) It was Hemos' decision to post this; anyone can submit anything they deem worthwhile.

    2) This was neither review or rant, but rather a lengthy and insightful look at some subtle but very important issues that will influence P4 vs. Mustang performance. Just because you've never seen anything on the web supportive of the P4 doesn't make a balanced piece a rant; it just means that you've been reading a lot of ignorant writing.

    3) Humorously enough, the "self-promot ing" Hannibal link [slashdot.org] I offered was exactly "every little review", this time of some gimmicky portable (but monitor-less) PC. I found it entertaining, and was happy to see it on /., but it was the very definition of a fluff piece--like much of /., now that you mention it.

    4) Hannibal is NOT an MPU expert. He himself will acknowlege this, and has in his articles (don't have time to find where). Email him yourself and ask him who is more of an expert, himself or Johan De Gelas, and I am relatively certain he'll say Johan. If not, he will readily admit that Johan is at least his equal and that Ace's is a much more technical site than Ars. And he will most certainly admit that Paul DeMone is 10 times the expert he is. Again, I really really like Hannibal's work, it honestly inspired me, and I submit every new Hannibal-on-architecture article to /. But he is just a student, not an expert.

    And, just FYI, I have read every single article on microproccessor design that has passed by the /. pages for two years, plus linkage from several other sites, and a few print articles (though I no longer like to touch paper. How primitive...). I could easily look up links, hell, just by using the search features on /. and Anand.

    ROFLMAO!

    You read the scant handful of poorly chosen architecture articles linked from slashdot and you consider yourself an expert??? HAHAHAHAHAHA. Oh--and sometimes you check your facts with little old 16-year old Anand.

    Look dude, it isn't my place to criticize you for not knowing as much about MPU design as I do. It is my place to criticize you for not realizing that there is much more to be known, for not realizing that many people do know more about it than you. I am certainly no expert--I'm just a college student--but it is blindingly clear that I know more about it than you, just as it is clear to most /.ers that they know more about computers than, say, the guy who says he needs to go out and buy more "RAMs" because the new game he just bought says it requires 250MB of free space to install. (Don't worry--you're not that bad, it was just an analogy. ;)

    Second it's quite clear that you essentially skipped all the parts of the article you didn't understand and concluded that if you--with your expert education on MPU design from /. and Anandtech--didn't know what was going on, it must be "IYNSHO, fluff masquerading as technical writing". Unfortunately, your opinion, humble or no, does not apply here: it is indeed a fact that this piece contained several new insights, and synthesized information which was not easily available in other forms. This may not meet your standards of being "more than a book report," but it is certainly meets those of technical writing. Obviously Johan could not hope to benchmark the new P4 or Mustang cores, as they are not released yet; still he managed to include some insightful benchmarks which demonstrate the points he was abstractly discussing with ample clarity. (Of course, if you're used to looking at MPUs as mysterious black boxes, then you might wonder what rehashed K6 benchmarks are doing in a Mustang/P4 article.) If you truly believe that this article included "nothing unique", why don't you post just one article detailing the issues I raised in my previous post? Since you've obviously read such an article yourself, MPU expert that you are, it shouldn't be too difficult to dig up a link, even without resorting to "the search features on Anand." (LOL!)

    No, Johan didn't take what might be called the "Hannibal route"--i.e. launch into an exploration of the overall design philosophies behind the two cores--because he is writing for a specific audience, a knowledgeable technical audience who can be expected to have read several pieces explaining the important design features of the P4 (not much concrete is known about the Mustang other than that it will be a K7 with tweaked layout to improve critical path and power consumption, and that it may recieve several other enhancements as speculated in the article), specifically those here [realworldtech.com], here [realworldtech.com], here [realworldtech.com], here [aceshardware.com], and here [chip-architect.com]. Not only have most regular readers of Ace's read all these articles, but they have followed some very interesting debates on them between industry experts on the Ace's tech forum for months now. It might be fair to criticize Johan for submitting an article which clearly assumed such a technical background to /. (although in fairness he includes a link to his earlier, more general P4 article in the very first sentence); of course, it's /. who decides what to post on their own site, not Johan.

    Re: preproduction benchmarks, the Tom's piece on the PII and the Firingsquad piece on the K7 were generally the only benchmarks available of the respective chips before their launch. If you followed MPU design news as closely as I do you would know this. There is a thing called an NDA, after all; as these two pieces demonstrate, both Intel and AMD like to make sure that those who choose to break theirs post erroneous information.

    You're of course right that a PPro was indeed superior to a PII at a given clock speed; if you look through the article itself instead of just relying on the concluding quotes I posted, you would find benchmarks which clearly understate the known performance of the PII by as much as 30 or 40%, though. There is no doubt Tom's preproduction benchmarks, like Firingsquad's, were horribly off. And as long as you're disputing my "always" contention--I've ponied up the links (and no, it didn't take me very long at all, because I, having followed MPU developments for a couple years, knew for example that it was FS with the bad preproduction K7 benchmarks, and Tom with the PII controversy); why don't you post a single pre-NDA "review" or even just a series of leaked benchmarks on a new x86 core which proved entirely accurate?

    Re: definition of a 7th-gen core: You really should pay more attention in class, boy, because I'm schooling you right now. I explained what I meant right after that comment in the original post: The Athlon is a new core designed to scale well to very high clockspeeds. Just like the Willammette. That's why the Willamette performs slower clock-for-clock than a theoreticl P!!! at the same clockspeed

    First off, there is no evidence that the P4 has lower IPC than the P3, except for preproduction benchmarks and some ambiguous comments from Intel VPs. If you read my previous post at all, you would realize these would tend to indicate that the P4 actually has higher IPC, not lower. On the other hand, the main evidence that it has higher IPC is that an analysis of all the new, innovative, braniac features of the core strongly indicate that it must.

    And second off, you couldn't be more wrong. By calling the K7 a "7th-gen core" you are obviously comparing it to the 6 previous generations of Intel cores. Each of them was able to improve both clock speed on identical process and IPC significantly over the previous generation. The Athlon beats the P6 in clock speed on identical process...but only narrowly: the Athlon sweet spot right now is around 1 GHz on Dresden's .18 Cu process; the P6 around 750 MHz on a .18 Al process. Intel's process is probably slightly better except for the large Cu vs. Al gap, so we can be charitable and say that, on identical processes, the K7 clocks 25% faster in an untweaked core than the P6 does in a much-tweaked core. Indications are that the Mustang/Palomino/Morgan K7 tweak will reach 1.5 GHz on .18 Cu, so perhaps 35-40% better on equivalent processes. As for IPC, the P6 and K7 are essentially equal. Indeed, this is being generous to the K7, as the P6 knocks it all over the place in the fairest cross-platform bench there is, SPEC. Yes, this is because Intel's in-house compiler group is better than AMD's...but the compiler is arguably just as much a part of a core as the silicon itself.

    Meanwhile, P4 roadmaps indicate that it will scale 100% better than the P6 on identical processes, and the analysis of Paul DeMone, a far greater MPU expert than you or I could hope to be, is that it will have 15-20% better IPC for integer work, and considerably greater gains for FP. (It's too soon to tell without knowing more about how well compilers will optimize for SSE2.) That would be a 7th-gen core worthy of the leap from 5th to 6th which the P6 provided.

    Again, don't get me wrong: the Athlon is clearly clearly superior to the Coppermine P3. But only by about the same degree as the Coppermine P3 was superior to the Katmai P3. That is, *not* by a full "generation"--whatever the hell that is.

    Re: important innovations in the K7: It's a huge innovation in the x86 world, something Intel hasn't done since the PPro days. First of all, the EV6 bus is new to x86 and a huge innovation, it is superior in every way to the old Intel GTL+.

    BWAA HAA HAA HAA HAA HAA!! Man, I'm rolling on the floor and crying that's so pitiful.

    Oh, but I'm being rude. Ahem. Pardon me. You, uh...you do know why it's called the EV6 bus, don't you? ...Even though that name happens to be shared by the current generation of Compaq Alpha MPUs... Or wait; actually the official name of the current Alphas chips is 21264; it's just that they like to code name their core variants things like "EV6" and "EV67" and the upcoming "EV68". Based on what, again? What's that? Based on the code name of the current Alpha platform???

    HAA HAA HAA HAA HAA! I asked you to pick one innovative feature of the K7, and you picked the one feature that AMD DIDN'T INVENT!!!!!

    Ok, I'm over it now. Phew.

    Right. AMD didn't invent the EV6 bus. They didn't help develop it. They in fact had nothing to do with it. They licensed it, wholesale, lock, stock, barrel, from Compaq where it has been in use for quite some time now. On the one hand, it was a good business decision because Intel had just clamped down and decided not to relicense the P6 bus (not really called GTL+ BTW, but don't worry it's a very common mistake) to AMD, and rather than take the time to reinvent the wheel (and thus delay the launch of the K7), AMD decided to go shopping at Compaq. Fine. Smart decision.

    Don't give me any of this revisionist history that they did it because it's 200 MHz, though. The K7's extra FSB bandwidth (courtesy of the EV6 bus (and the engineers at Compaq, not AMD)) has up till now been entirely wasted as it is paired with SDR SDRAM (1.6 GB/s FSB, only 1.06 GB/s from DRAM)--generally paired asynchronously with PC133. If it were any help at all, don't you think the Athlon would be winning and not losing in FSB-intensive benchmarks like Q3? Meanwhile, it's a huge waste of pins and power--as well it should be, since it was originally designed for $10,000-50,000 workstations and servers, which, frankly, can afford the extra mobo costs, power supplies and electric bills.

    Now, of course this extra FSB bandwidth will finally be put to good use with the advent of DDR mobos for the K7, *finally* starting early next month (fingers crossed!); latest news is the 1st. DDR mobos for the P3 will show less improvement because the P3 is stuck at 133 MHz FSB. Fine.

    But this isn't why AMD chose the EV6. Indeed, when they made that decision, the DDR standard had either barely-just-been or had-not-yet-been determined by JEDEC. Intel was set to steamroll RDRAM into every PC, and there was little to no indictation that DDR would ever be a volume part in the PC industry. (It'd be used in servers and such.) AMD chose the EV6 because they *had* to, not because they wanted to. It's a great bus when doing what it's designed for--connecting specially made (quite expensive) double-wide SDRAM to Alphas, at FSB freqencies up to 466 MHz. But it offers little to no performance benefit in the here and now for the K7. And as for DDR and high-speed buses, Intel will be releasing their Tualatin revision P3's in Q2 with a 200 MHz FSB, in time for Almador, their (maybe--legal issues with Rambus...) DDR P3 chipset. So yes, the K7 will be first with decent DDR support in the x86 space. The P4's dual-RDRAM chipset and 3.2 GB/s FSB will be faster, though more expensive, as far as memory performance goes, though.

    But calling a bus that AMD had exactly zero nada zilch nothing to do with evidence of their design innovation gets an extra HAHAHAHAHAHA from me.

    More on AMD's innovative EV6 bus: it's a huge boon for multiprocessing, with the 760MP coming to retail very soon according to Anand.

    Unfortunately, according to AMD's Q3 earnings report Investor Conference Call 2 weeks ago (I was listening; somehow I doubt you were...), the 760 MP has been delayed to at least Q1, possibly Q2. They played it off as strategic reasons (business demand down; no major deals with the big 4 server OEMs (Dell, IBM, HP, Compaq) for AMD in the enterprise lines), but considering they only had one 2-way system--behind closed doors and not running anything--at MPF it looks as if their engineering is behind too. On the one hand, too bad, because point-to-point beats shared bus any day. On the other, there's a reason why Intel went with shared bus, and it's not because they'd never heard of PTP. It's, well, easier to implement. When doing the right thing takes over a year longer, it sometimes becomes doing the wrong thing. (Not that I believe that's true here, but it's worth taking into consideration.)

    Now, the countryside is littered with Athlon clusters crunching numbers for the scientific community in places where they'd never have considered using a P!!!.

    First off, scientific computing is such a niche market as to have absolutely negligable impact on the bottom line of either company. The idea that AMD designed the K7's huge-ass FPU--thus taking up vital die-space--for the lucrative physicist market is laughable. It's an unbalanced design, plain and simple. Second, last time I checked, most scientific computing was being done either on Alphas or on Beowulf's of Celerons. Now, I don't doubt that K7's are moving heavily into the mix; if I was doing scientific computing, I would go with a cluster of Durons in a heartbeat.

    But do you really, honestly, think that when AMD decided to go with the 3-wide FPU there were dreams of meteorology and electron potential modeling spinning in their heads? Me either.

    And your "analysis" of the supposed advantages of the Coppermine's cache over the Thunderbird's are positively laughable. You see, in the REAL WORLD people don't run benchmarks on their boxes all day. They run apps and processes, usually several at a time. That's why the Athlon's cache is superior--you can keep more in it instead of swapping to system RAM, which is a MUCH BIGGER HIT than having a small amount of cache latency. In the REAL WORLD, the Athlon's cache architecture makes sense, not in your fantasy where we all run CPUmark all day.

    Uhhuh. That's why the Katmai P3--with its half-speed 512 Kb L2--was so much faster than the Coppermine? That's why the Athlon "Classic"--with down to 1/3-speed 512Kb L2--is so much better than TBird??

    You think it's faster to perform a context switch with a 64-bit bus to L2 than a 256-bit one? Golly, imagine how slow the P4 with its 48 GB/s bus to the 5-cycle latency L2 will be!!

    Furthermore, in case you'd forgotten, all these chips operate at over one billion cycles per second. Multitasking occurs at much higher granularity than this, and even if your analysis were right (it's not), the effects of multitasking are invisible to a chip to a second or third order of approximation. The effects of a 7 (or 5!) cycle L2 vs. an 11 cycle one most certainly are not.

    And then you go on to say that the K8 will just be a derivative of the K7. You really aren't paying attention at all. The K8 Hammer architecture is completely new, not only extending x86 to true 64-bit while retaining backwards-compatibility with 32-bit and 16-bit code, but adding huge and significant architectural innovations. Go read about it, dimwit, before you guess at what it is. Lots of documentation has been released--even just a quick scan of some Slashdot search results will make you a lot more knowledgeable about it than you are now. Geez...

    No, this is false. It is by now quite well known that what will define the Hammer family will be just a simple extension of the x86 ISA to 64 bits--an extension which will have essentially no use for the average PC user, but rather only for those who need 64-bit integer precision (CAD, etc.) or >32-bit memory address space (database, etc.). In addition, "the K8"--that is, the Sledgehammer, aimed at the enterprise market--will feature 2-way CMP and AMD's new Southbridge standard, LDT. Ho-hum. Nice features (LDT has no place in the PC either, though), but nothing extraordinary, especially considering it's not due until early 2002. In addition, there has been mention (Sanders mentioned it in an interview) of another K8 variant called Clawhammer; speculation is that this is a PC version of the K8, although it's not known what, other than x86-64, will differentiate it from, say, Palomino.

    If you actually believe the K8 represents an entirely new design, then it may be that your news was correct but just a little (about 12 months) late. The K8 was indeed scheduled to be a ground-up clean-sheet kick-butt design, but was radically scaled back by Sanders less than a year ago. Head Designer and impressive guy Atiq Raza quit around a year ago, following this decision, and the hopes of a truly innovative K8 went with him. Of course, evolution is often better than revolution in the MPU industry--eg. RDRAM. If it can manage to position itself against Itanium, the K8 might look very strong. (Of course, McKinley will be on the way by then, and it's considerably less of a joke.) But claiming that the K8 is a revolutionary new design is plain false.

    As for the recent /. articles on the K8, they have all been, IIRC, about the recently released x86-64 simulator to help Linux, etc. port to the new ISA. This has nothing whatsoever to do with the design of the K8 itself--an x86 simulator could help "port" Linux to either a P5 or a P6, although they could not be more different architecturally--just the ISA.

    Re: P4 moving into heavy volume in Q3 2001: God, you are a shameless, and dim-witted, Intel apologist, just as I suspected. Q3 2001?

    How precisely does this make me an Intel apologist? Is it too early?? It is a known fact that Intel's roadmap moves the P4 solidly into the mainstream category in Q3 '01 with the introduction of the Northwood P4 on a .13 Cu process. Northwood will allow Intel to get good yields at >2.0 GHz, and, more importantly, takes up a much more reasonable die space for mass production. Just as important, its release will coincide with the release of the (hopefully DDR) SDRAM Brookdale chipset, which ought to move the P4 out of the quite-high end where it will be stuck with the dual-RDRAM Tehama chipset. (3rd party DDR chipsets may be out for the P4 before then, but probably not in much volume before Q2 at the earliest.)

    Is it too late?? The latest Intel roadmap shows the P4 moving to the upper end of the mainstream category in Q2, but I believe that to be a lie by Intel marketing, eager to cover up the fact that they essentially have no upper-mainstream product from now until Q3 2001, a hole in their product line a mile wide. (Am I still an Intel apologist?) Indeed, this is the reason I just bought AMD stock very recently, and have been encouraging my INTC-owning relatives to sell ever since, well as it turned out, just before the peak late this summer. And yes, like you too I am generally appalled by Intel's heavy-handed anti-consumer tactics--suing VIA and refusing to release a PC133 chipset in a lame attempt to force RDRAM down the industry's throat; paper launching the 1 GHz P3 6 months before even limited volume was available, the 700-850 MHz P3's before it around 3 months early, and the 1.13...oh the 1.13...all in a lame attempt to pretend the P6 could keep up with the Athlon; bribing Michael Dell with special pricing and all several dozen GHz P3s available this summer to spread libelous statements to the media in a lame attempt to disparage AMD's products; spreading IA-64 FUD in a lame though successful attempt to scare designers of competing RISC chips to delay (Compaq, HP) or eliminate (MIPS) their next-gen chips; keeping the Celeron FSB clocked at 66 MHz and "single-processor only" in a lame attempt to...be lame.

    Don't worry, I dislike Intel plenty a lot. I cheer for AMD, and make no bones about it.

    What bothers me, though, is that, having been on the Athlon bandwagon since summer 1999, when I first read analyses of how the K6's poor scaling was due to architecture not process quality, and how the better balanced K7 had the chance to scale even higher than the P3, I've seen how this position has gone from being contrarian, well-informed and far-sighted to the position of a growing mainstream of ill-informed buzzword-spouting reality-ignoring AMD fanboys. No, not you; the people I'm talking about are much much worse (and hence not nearly as able to fool /. with uninformed arguments). What's even worse, though, is that several influential tech sites employ writers not much more knowledgeable than you, and they spout the same pro-AMD propaganda day after day after day. It's not that I dislike seeing anything pro-AMD or anti-Intel; indeed, exactly the contrary. It's just that I like it to be true.

    Plus, AMD's execution with the K7, while quite good, has been well short of the claims that I and many others were making for it over the past year. The benchmarks have been disappointing. There's only so much excitement you can get out of awesome benches in 3DSMax and ViewPerf before you notice those Q3 and Content Creation scores just aren't going to change. (Yes, I know CC is Intel-biased. Whatever.) Thunderbird in particular was a huge disappointment, offering gains on the order of 3-5% over Athlon Classic while the Coppermine P3 beat Katmai by 10-15% (it's that 64-bit vs. 256 bit L2 bus). MP has been MIA for months now. The K7 laptops are late as well; high power-consumption is the price you pay for unneeded FPUs.

    Having read the Willamette articles I've now referred you to twice (the DeMone ones on RWT), having seen Paul defend his unorthodox position on the Ace's tech boards for months now, basically skewering even very well-informed arguments on the AMD side, I've gradually become convinced that the "web hardware community" is greatly underestimating the P4's performance. So have many people much more knowledgeable than me--including the formerly (and still, though less so, IMO) AMD-biased Johan.

    I usually go around looking to argue with P4-bashers who seem intelligent and well-versed in the technology, because they give the most interesting arguments and are the most willing to learn. Unfortunately, I too often have to correct well-meaning but misleading posters like yourself, who ignorantly pass on the same-old wishful thinking and oversimple analysis as fact.

    I like AMD. I really do. I want them to "win", inasmuch as I want anyone to. I really do want them to stay very very competitive, like they are now. (And to make me lots of money!) But I just don't think it's helping them, or helping the truth, to pretend that the K7, a largely derivative design, will be able to keep up with the radically innovative P4 for very long. And I don't think it's furthering the principles of beauty and elegence in design--which is what really interests me in this stuff anyways--to call an insightful and fair (I thought it strongly gave AMD the benefit of the doubt, BTW) analysis of the strengths and weaknesses of the P4 and Mustang designs "an ignorant fluffy rant", or whatever you said.

    I won't expect the apology from you, but you have my email address if you should want to send it. Meanwhile, if you're really interested in MPU design, please read Paul's articles at RWT [realworldtech.com]; they're fabulous and take everything to a whole new level. And if they must be anti-Intel, you can't do better than his Merced/Itanium articles, here [realworldtech.com], here [realworldtech.com], here [realworldtech.com] and here [realworldtech.com].

    Also you should check out the tech forum [aceshardware.com] at Ace's, and the very AMD-biased but usually literate and often a great site for news and links...JC's [jc-news.com]. Plus the usual suspects: Tom's, Ars, The Register for juicy-and-occasionally-even-true rumors. You could learn a lot, and trust me, it's fascinating stuff.
  • I meant, from the launch of the P4 until...until whenever. For the next month-and-a-bit, AMD most definitely has the highest-performing x86 chip around, bar none--as it arguably has for the last 11 months or so. (Intel's highest speed grades have been available in such laughable quantities that only the benchmarkers were the only ones to get a hold of them; thus it's arguable whether they aught to count any more than the 1.13 does now--i.e. not at all.)
  • Yeah, sorry about the Spitfire. I always get Spitfire and Sledgehammer confused. I'm almost positive there wasnt an K7 that was codenamed Camaro, however. There was, however, a P51 Mustang fighter plane in WWII.
  • I was only giving those two examples because those are two things I've been trying to do with my system recently. A Matrox G400 can do YUV conversions in hardware, but not MPEG2 decoding..... as far as I know...
    -----
  • Bullshit. Goedel says NOTHING about it. Turing, on the other hand says that it is quite possible.

    If you want to consider what Goedel's Theorem says about emulating Crusoe on Crusoe, it says it *CAN* be done... that you can use it to make meta-statements about the Crusoe. It does say that you can construct a meta-statement that has no proof, but that is *ALMOST* the Halting Problem (remember, pedantic types, I said almost).

    Remember, you can use a TM to make statements or prove/disprove theorems about TMs.
  • Actually, there is not all that much difference between RC5 running on Celeron/PPC of the same clock speed under linux, the PPC is about 200K/Key/s faster, athough the OGR stuff the PPC arch seems to hit the spot, (Written from my Celeron, talking about it and my iBook)


    How every version of MICROS~1 Windows(TM) comes to exist.
  • Me Too!!!!


    How every version of MICROS~1 Windows(TM) comes to exist.
  • damn where did my little HTML wakka things go?? Me == Lame, i better go eat cheezypoofs!


    How every version of MICROS~1 Windows(TM) comes to exist.
  • Are double pumped, hyperpipelined, low latency designs the only future for x86?

    And what's wrong with double pumped, hyperpipelined, low latency x86 designs? Though I must admit that the i686 is getting a little old, so let's perform the aforementioned optimizations to that architecture.

  • The bulk of gcc development is done on the x86 portion of it's capabilities. The performance between a 400Mhz G3 and a 400Mhz Pentium running various Linux apps tends to come out about even. Sure the G3 has things going for it that should make it perform better but gcc doesn't optimize as well as it could for it. Now if gcc generated highly optimized G3 code........

  • I don't think that I'll "cache" in for another "fiscal year" or two. My "Katmai" should hold out until then
  • I understand that x86 architectures run the MHz because the instructions are decomposed in smaller ones that fit tighter clock periods (thnx SIC prof.) This means more in-flight data that isn't yet finished but frees the processing units for more part-of-a-complete-run ops. All this boils down to a processor that executes many unfinished steps per time unit without actually having done anything useful (Watts!!!) until either everithing is discarded or the solution is finally available... Motorola does it differently... slower MHz true, but more solutions per time unit (feel free to correct me), also cheaper on power and fabrication tech...
  • If we can sqeeze more out of a system and have it last 4x as long, then I would rather spend the money on the sofware that runs better and potentialy less bugs with the programmers going through the code more times. Hardware costs more in the long run then any software. I perfer to upgrade when I want to, not when I want to run the newest version of something.
  • Comparing modern X86 desktop power consumption to Motorola's PPC offerings is just lame. If your concerned about power look at the laptop processors or some of the embedded x86 designs.
  • thunderbird? mustang? let's just hope AMD doesn't make a Yugo, or a Gremlin processor.

    Intel Corvair? Unsafe at any speed? :)
  • Uhh, you mean, like Merced/Iltanium/Whateveritscallednow?
  • ...does quite nicely... Kernel compiles in about 3 minutes...

    Sure, instant gratification would be better ... but I'm still cheap, so I'll hold off on the mult-Xeon box for the time being...

  • All I care about... (Score:3)
    by Ron Harwood (harwoodr-AT-technologist.calm) on Monday October 23, @01:58PM EST (#3),br> (User #136613 Info) http://theGEEK.org
    ...is getting the maximum horsepower for the minimum price... everything else (with the exlusion of stability, of course) is a moot point.

    I don't care how they get me there... as long as it's cheap and damn fast.

    The above comment is pointless, clueless and adds nothing to discussing the technical merits of AMD's next generation chip design vs. Intel's. I don't understand how this completely irrelevant comment is (+3 insightful) instead of (-1 offtopic) in a technical discussion.

    Frankly, I have always been of the opinion that if you have nothing to say, then don't say anything instead of shooting your mouth off and having people doubt your intelligence. There is no one forcing you to post to slashdot, so there is no reason to post irrelevant crap to technical discussions simply because you have nothing technical to add.

    Just my $0.02

    Second Law of Blissful Ignorance

  • Who cares? After about 200 MHz, speed stopped mattering much

    No kidding. Our RAM crawls along at a tenth the speed of the CPU. If two boxes do a fetch from memory, and they stall for the same length of time while waiting on that request, then it doesn't matter if I have the 233 megahertz CPU and J.Random Jones has the 50 terahertz CPU.

    I'm a software guy. All you hardware guys: make you a deal. You work on bus speed and faster RAM, leave the CPU alone, and I'll work on better compiler optimizations to schedule the instructions such that the stall time is less. Together we will bring world peace, or at least speed up Mozilla.

  • I was wrong about Goedel, but let's talk Turing.

    Remember, you can use a TM to make statements or prove/disprove theorems about TMs.

    Provided you have a longer tape. According to this page [nmia.com], a Turing machine has among its parts "a head that can read, write, and move along an infinitely long tape that is divided into cells" (strong mine). In the real world there is no such thing as infinite RAM. An emulator in general has to have more registers than the system it's emulating; otherwise, registers will spill to RAM, which is a Bad Thing for an emulator that runs at such a low level, unlike user-space game console emulators that can spare a few bytes.

    Oh, BTW, the Brainfuck language [muppetlabs.com] is one of the smallest (eight instructions) Turing-complete [everything2.com] languages in existence.

  • by arcade ( 16638 )
    Personally I'm a bit tired of the entire x86 architecture. It has had patch upon patch upon patch applied onto it. Its a sucky design to start out with, and they're just adding more and more to it. I hope we'll see a more sensible cpu design in the future.

    If cpu designers were like webpage designers, they would make a cpu like this [tvu.ac.uk]. On the other hand, that is quite unlikely. Ohwell, I hope they'll drop it in a while.


    --
  • What we have is an editor then that is not using it's available resources wisely.

    Rendering will always be a time consuming process. It won't get faster. Rendering a fixed format, fixed resolution, fixed framerate video will get faster of course... but as CPUs get faster, more CPU intensive algorithms will appear, to get better resolution, higher quality, more effects, etc.

    So upping CPU speed will gain you nothing, in the end. As for the 5GHz need to get a responsive app; then the app is probably going about it incorrectly. Any video app should probably be approaching the app as if everything were a proxy for the final, and none of the 'sources' get touched until a 'render' cycle is committed. That way you are only working with low memory, low resolution, fast, responsive interfaces. And when you go sleep, the rendering takes place, at infinite resolution, infinite quality, etc. At least, until you wake up!

    The nick is a joke! Really!
  • What amazes me is the number of people that have been snowed by Intel's marketing hype over clock-rates.

    Who gives a shit how fast the clock is? It doesn't matter!

    What actually matters is the amount of time to get the processing done. The PPC does a VERY good job of crunching data, especially vector/FP data, in few clock cycles. This keeps the clock-rates, and therefore the power consumption down.

    If you took a G4, clocked it at 1GHz, it would crush the x86 processors. But it's not designed to run at those speeds, so it crushes them anyway, at 500Mhz.

  • by Syllepsis ( 196919 ) on Monday October 23, 2000 @10:08AM (#682587) Homepage
    I like how this article addresses the perception that the company not leading the market is only following. With huge high tech firms like AMD and Intel, hundreds of incredibly intelligent people are put to work to solve a complex problem while following a carefully outlined strategy. In reality, corporate warfare is much like a chess game between grandmasters. IMO, each companies strategy is a strong one, and the winner will be decided by a variety of market forces including which strategy works best for tomorrows software (and who can tell now?). Both companies are planning masterful strategies to the problem of x86 design, and I think that as a so called "learned layman" in the processor business, it is quite a bit of fun to sit back and watch.
  • Yes, we can get better performance by throwing more transistors at a problem, increasing power consumption and heat generation, and by increasing the package size. Who cares? After about 200 MHz, speed stopped mattering much (I know, I know, someone always mentions games). I'd much prefer to see smaller CPUs, both in terms of physical size and power consumption. This pissing contest isn't proving anything.
  • Even on the SpecInt benchmark, the earliest PPro benchmarks I could find on Spec's website show that while the PPro put in some respectible numbers, it was far from being the king of the SpecInt benchmark.

    This is wrong. The original PPro launched just before the EV56 revision of the (at that time) DEC Alpha (i.e. the Alpha 21164). The PPro's SPECint was better than the previous Alpha's, and worse than the new Alpha which launched a few months down the road. Thus, the PPro was, at the time, the absolute SPECintCPU95 king. Of course it was later eclipsed in performance by better and more expensive chips; the point is, at the time it was an extraordinarily innovative design and a miraculous surprise to the entire MPU community.

    For starters, SPEC is not a single benchmark, rather a consortium that comes up with benchmarks, the most well recognized being their CPU benchmarks (colloquially refered to as SPEC benchmarks). These benchmarks however, do not exclusively test a CPU, but rather a system as a whole, although they are designed to make the CPU the limiting factor (nonetheless using bucket loads of RAM, fast disk controllers, and a huge external memory cache can have wonderful impacts on SPEC benchmarks). Typically these benchmarks have been divided into those that stress the integer unit (SpecInt) and those that stress the floating point unit (SpecFP). The Pentium Pro was the first x86 CPU to post respectable SpecFP benchmarks, but it still got it's butt kicked all over the place compared to it's RISC competition.

    Yes, the SPEC_CPU benchmarks test an entire platform--CPU, i/o, memory, and (very important) compiler--not just a CPU. Of course, this does nothing but put x86 chips at a major *disadvantage*, as they are only available in configurations which do not even approach the i/o and memory bandwidth available to the big RISC chips. (Of course, Intel has perhaps the best compiler group around, which mitigates this somewhat.) Still, the SPEC_CPU benchmarks are called SPEC_ CPU for a reason, as they make every attempt to, as you say, make the CPU (and its attendant cache hierarchy) the limiting factor as much as possible in a suite of varied, meaningful, non-synthetic benchmarks. As for the other SPEC benchmarks (i.e. not SPEC_CPU), they are of course designed to benchmark the, well, non-CPU parts of the computer, and are thus irrelevant here. Again, when one refers to "SPEC scores" it is assumed that they mean SPEC_CPU; thus the original poster's usage was quite correct.

    As you point out, the PPro never challenged the Alpha's SPECfp lead, due to a major deficiency of the x86 architecture--backwards-compatability with the ass-backwards x87 fp instruction set. Luckily, this is finally being phased out with the P4's SSE2 instructions; thus if Compaq fails to execute with the Alpha in the coming months, and if Intel's SSE2 compilers are good enough, the P4 may just win both SPEC_CPU benchmarks outright for a time--perhaps as much as a year or more. That ought to shut up the ignorant anti-x86 FUD on /.

    Yeah, right.
  • They fall into the 'satisfies the consumer' and doesn't play the pissing game.

    Not out of choice, mind you. I'm sure Apple would love to be able to out-piss Compaq, but for now, they are *forced* to settle for 400MHz, ATI video cards, etc.

    And for that, we get fanless, floppyless, serialless systems at 1/3 premium prices.

    The nick is a joke! Really!
  • As long as our favourite non-free x86-only OS rules the planet, we'll have x86.

    As soon as free OS's will gain a certain critical mass, competition between CPU instruction set designs will really take off.

    And x86 will lose any fair contest to architectures like ARM or MIPS.

    It's only the 'economy of scale' that's sustaining the design and production of enormous chips with huge ridiculous-looking cooling systems; if all software could easily be recompiled there would be no need for these monsters. I don't think that even 2GHz CPUs need big coolers if they have a sensible instruction set design.

    So, you see, Free Software not only gives you freedom, it will also save the environment ;-)

  • by yerricde ( 125198 ) on Monday October 23, 2000 @11:20AM (#682600) Homepage Journal
    Transmeta's Code Morphing technology is designed to emulate CISC architectures efficiently. Think about it: could you do an emulator for one Crusoe chip's internal architecture on another model of Crusoe chip? Goedel says it'd be quite tough.
  • by stilwebm ( 129567 ) on Monday October 23, 2000 @10:20AM (#682608)
    Tom's Hardware [tomshardware.com] disects these terms a good bit, and compares the various processors that use these platforms. Be warned that Tom is a little biased as an anti-Intel kind of guy.
  • My dual PIII-800MHz screams along just fine. Then again, so does my K6-2/350, and so does my roommate's P-150.

    Believe it or not, there was life before X and GNOME.
  • ****WAVES HAND!!!!****

    My laptop is still a 233 and my main home beast is a dual 333mhz both PIIs.

    Personally, I would love a 600mhz Celeron, but I don't have the money currently. Ah, someday.
  • When the P6 was released, it was the fastest processor available in industry standard benchmarks (SPEC, including Alpha). Its design was highly original, and manages to keep the CISC nastiness contained to the first few stages of the pipe. Claiming that the P6 was not a world-class design when released is only a testament to your own ignorance.
  • 400MHz PowerPC vs 400MHz x86 is really irrelevant. It's the software written and optimized under each platform. Linux, for example, on a 400MHz x86 will blow away a 400MHz MacOS8 system trying to run Apache, for example.

    It's the limitations of the OS; There are other limitations, too, of course, such as APIs, support, etc. Forget USB under NT, for example, or Firewire, unless you use Win2k. Linux doesn't do the game thing all that hot, not quite yet.

    The x86 design, despite being crappy, kludgy, and non-embedded, is still winning due to engineering, marketing, and distribution ^^

    The nick is a joke! Really!
  • As for a site, I couldn't provide you with one, however I can answer some of your questions.

    Socket 7 is the way Pentium and some Cirix and AMD chips as well (ones that were comperable with the Pentium II class). You won't have to worry about this unless you're going to build a cheap, old computer.

    As for processors that work in dual configurations, just purchase two Pentium IIIs that run at the same clock speed (ie: 600 MHz, 700 MHz, etc) and you'll be fine. Just be sure that the motherboard you purchase will support the clock speed you purchase.

    With RAM, the speed depends on which processor you use on a motherboard. For example, I have a Dual Pentium III 450 motherboard which supports Pentium II and III CPUs from 300 to 600 MHz (remember, if you're going to go dual, the processors have to be the same). If I have a Pentium II processor, I have to use 66 MHz RAM. If I have a Pentium III processor in there, I have to use 100 MHz RAM. In the newer motherboards, it won't be Pentium II as opposed to Pentium III, but clock speeds (ie: 700 MHz must use PC133 RAM and below 700 must use PC100).
  • by Wakko Warner ( 324 ) on Monday October 23, 2000 @10:25AM (#682622) Homepage Journal
    The P6 (Pentium Pro) core was a masterpiece, and its performance sent shock waves throughout the industry.

    Calling anything from the x86 world a "masterpiece" seems, to me, like putting a gold star on the best-looking fingerpainting in the special-needs Kindergarten class.

    -A.P.

    --
    * CmdrTaco is an idiot.

The end of labor is to gain leisure.

Working...