Become a fan of Slashdot on Facebook


Forgot your password?

Pentium 4 Re-evaluated, Again (Again) 136

An unnamed correspondent writes: "It looks like Tom's Hardware Guide has been busy with the P4. This time a re-compiled version of the MPEG encoder (the same one they benchmarked with in the last article) shows the P4 doing really well. Also interesting is the performance boost that even the PIII and Athlon procs get from the Intel compiler. Take a look at the article here." Seems that as usual, benchmarks are what you make of them. The P4 apparently can perform much better than initial tests have shown. Tom Pabst makes some good (if fawning) points about the complexity and fairness of benchmarking in general, too.
This discussion has been archived. No new comments can be posted.

Pentium 4 Re-evaluated, Again (Again)

Comments Filter:
  • Back when OkieSU got their first Intel Hypercube (128? 286 cpus in a box-- wow), I heard from someone that wrote the benchmark code for Intel say that the benchmarks are the processing level "guaranteed to never be exceeded"
  • Programs using SSE2 instructions will need those instructions available when they run, elsewise bad things happen. But what a gain! If ever you get the chance to talk to anyone from Intel, say that you'd like to see more of this.

    Guess what? We already have. The 486 had instructions that were not available on the 386; programs that use them cannot be run on the 386.

    In short, the definition of x86 you seem to be using doesn't, and never, existed. Never have you been able to use all the instructions on post-386 processsors on 386es, any more than you could do so with the 286 or 086. x86 compatibility has always run the other way, and it's still 100% the other way -- all the instructions for the 8086 are available on a P-IV.
  • Look, the x86 architecture is not going anywhere. Nor will it ever. Intel knows that, which is why it's not even bothering to attempt to extend it again with 64 bit extensions. AMD knows this too, which is why they ARE trying to add 64 bit exensions to it. Transmeta even knows this, since their first processor would be useless without x86 compatibility. They're all right. The world demands backwards compatibility. If you have a problem with that, don't buy x86. But don't yell at Intel for supplying what most people want and what they make the most of...
  • Hi folks,

    I think it's kind of ridiculous that most folks don't understand the concept of benchmarks. It's common knowledge among hackers that benchmarks test specific aspects of performance, and can be made to show better or worse performance depending on what the benchmark author wants to say. Unfortunately, many folks (maybe not you but many other folks like you do this) base purchasing decisions on benchmarks and spend hundreds of dollars more than necessary on hardware they don't really need.

    Being a programmer myself, I know just how flippin powerful even the "outdated" CPUs are. Recently, I have worked on the Pentium III, the Celeron, and an old 486 at 66mhz. Most of my recent works are prototypes built for ease of maintainence and clarity rather than performance, and if I may say so myself, they do perform extremely well, even on the 486. I'm sure there are areas in computing that a powerful workhorse CPU like the Pentium III or 4 is needed, but what most readers probably don't know is that there are literally thousands of mission critical, real-time computer systems out there that run on 4- or 8-bit computers at speeds like 1 or 2 mhz, and they get the job done. Every user action is carried out instantaneously. The ridiculous part is that most folks out there don't understand that a newer CPU won't get them better overall performance. The user still needs to wait for the hard drive to churn, the network card to accept incoming packets, and a thousand other things; besides, it's really the software algorithms and implementation that causes the performance, or lack thereof. (These are the reasons I don't like Intel's claim that their newest CPU will give the user a better Internet experience.) The only place a faster CPU will get you performance is in tight code containing nothing but intense computations. Most folks will think of games when thinking of intense computations. In this case, I agree that it is critical to play Quake at 230 fps rather than 200. :)

    I apologise for being so blunt in my comment but I need to run out the door so I'm in a hurry, and I'm kind of frustrated at the things that happen because of marketing and "benchmarks" that don't really mean anything (at least to myself). I hope I was able to successfully convey my point without insulting anyone. Hopefully, someone can comment on this and either help me out or prove me wrong... I'm open to others' suggestions

    Kind regards,
    Nathaniel G H

  • Running real world programs like FlasK seems like a reasonable test. A highly CPU intensive application that any user could run if they wanted to. Not nearlly as "elitist" SPEC's series of benchmarks, which cost a fortune just to be able to have access to the software they used.

    In the mean time, I think it's good that he's rerunning his tests as he gets new data. That's one of the things that's great about the web, nothing is static, and news can react as quickly as events change, instead of magazines which have press deadlines so they have to publish whatever they can by a certain date, only to print a tiny correction a few months later. Tom's being upfront about what's going on, and keeping all of us much more informed than we would have otherwise been...
  • How about if I run some legacy 286 benchmarks on any of the newer processors, and do a instruction per clock analysis? Then compare it with the 286 processor... I bet you'll get some very interesting results. You want compatibility don't you? this is the ultimate compatibility test...

    Eventually, newly released code will contain the newer instructions. And eventually, people will upgrade their software if they think the increase in performance is worth it. Basically, this technology will be adopted eventually, no matter what any reviewers said. Remember when the Pentium Pro was released and ppl were complaining that it ran 16-bit code slower then the regular Pentium? Its happening again, this time its SSE2 vs. non-SSE2 and whatnot. I see myself owning a P4 this time next year when they crank it up to 2Ghz and beyond... AMD will need to eventually come up with a new microarchitecture because the fact is the Athlon does not scale as well as the Pentium IV.
  • Quake benchmarks are useless except for video cards.

    And how is encoding video to mpeg 4 less usefull than photoshop filters and povray renders? Those three are all the exact same operations, basically. If anything, mpeg encoding puts more strain on the memory system than a photoshop filter would, simply because you're dealing with 4 to 8 gigs of data running through the CPU rather than a 30 megabyte file.
  • 10 gHz?? P4?? They don't belong in the same sentence. Pentium 4 will be long gone before we even hit 5 gHz, let alone 10.
  • Speaking at least for me... I could care less that i have the linux source code and all of these development tools installed on my system. I started using linux because it was the most well supported Unix like operating system for x86 PC's, as well as the cheapest. In the end, I'd have preferred OpenBSD, but Oracle won't run there, and that was one of my critereon when purchasing my new system.

    But I have no desire to compile all of my command line tools. Wow... I could eke out a little extra performance from grep and ls? I've an Athlon 700, so even if the software is horribly unoptimized, the machine more than makes up the differnce.
  • by Chalst ( 57653 ) on Sunday November 26, 2000 @06:48AM (#601657) Homepage Journal
    The GPL requires that you be willing to give the source under a GPL
    lincense to anyone who receives the binary. Tom would therefore be
    entitled to the source. Unless you receive the binary, you would not.
  • whoa, I totally would have missed that had you not pointed it out, way to be on the ball!

    Question is, who should be notified of this problem?

    Also I read every now and then about how if you do not protect your IP rights, then they can be taken away? if so, does this apply here too?
  • The GPL requires that you be willing to give the source under a GPL lincense to anyone who receives the binary. Tom would therefore be entitled to the source. Unless you receive the binary, you would not.

    Tom would then be entitled to get out the source, since according to the GPL, Intel could not restrict him from doing so.

  • Well, lets make a few assumptions.

    Lets assume that the Pentium 4 (and its derivatives) will scale just like the P6 core - Pentium Pro/II/III.

    Lets also assume that the fab process technology will improve just like it did during the lifetime of the P6 core.

    And lets assume Moore's law is actually true -- it's a universal constant ;)

    The P6 core started at 150Mhz and it reached 1 Ghz. The Pentium 4 started at 1.4Ghz, so based on the above assumptions, it will reach at least 9.33 Ghz, and 10Ghz is not too far from there.

    This may or may not be reasonable, you decide. But it's just interesting to think about...
  • The recompiled benchmark (not the SSE2 optimised one) is about 25% faster on the Pentium 4 1500mhz then the Athlon 1200mhz. 1500 is 25% more than 1200.

  • I haven't read the GPL myself, so I'll just comment based on what I've been told..

    the source must accompany the binary (if requested) - the Intel engineers only supplied Tom with the binary, so only Tom can ask for the source. I suspect nobody can force them to release the binary and/or code to anyone else.

  • Note to moderators: This is not trolling, flamebait, or anything of that nature.

    Like the post I saw earlier (I think it was #10) about the recount, it made me realize that the two situations were comparable. Look at it, you have two parties - Republican, Democrat, Intel, and AMD (you decided the comparison between those) - and when one issue comes up where either party can gain an advantage (Florida and Flask) they dive on it with their spin-doctors to fit the results to their will.

    Neither one will give up, and (as far as I'm concerned) both are only focusing on money and not the best interests of the people or customers (as it should be).

    Oh, and a question for those more knoledgeable in programming (I'm only 1st year college C++), why would a simple recompile benifit the scores? Since it's an MPG-4 encoder shouldn't it already be more or less upto speed?
  • [Glove Slap]
    I demand satisfaction.
  • I would love to see a benchmark that examine athlon 3DNow! optimization with P4 using SSE optimization. I know such a benchmark is essentially as useless and biased as the rest but then why not yet another one?
    Anyway, are those optimization really that necessary? 99% of the software that people used does not need those optimisation. How many of your friends uses video editing/compression software? And if they do, they used it how often?
    But still, all the hype is kind of fun... or rather, life is kinda of boring.
  • I purchased my last Intel CPU in 1996. A few decades ago in computer time. Because of other like minded people supporting competition I can now enjoy my shiny new Tbird 900 which cost me about $180. Intel stalwarts can also enjoy relatively low prices for strong performing products. If code can be easily "recompileded" to optimize for the P4 then it should be a success in time. Early adopters always get their pocketbooks burned. I don't hate Intel. They are just another big business in the comp/universe. If I have anything against them it is directly related to the high premiums I have paid for their products in the past. Solution: Support a clever competitor by buying and using their products...
  • but now I'm excited to see what this technology can bring. How difficult would this be to implement on the OS level? Would SSE2 even benefit OSes? How about this funky MMX? How does the 16-bit discrepancy between SSE2 and X87 FPU affect things? Do we need that precision for most things? Tom didn't mention much about how the quality of the resulting movies varied, either.

    Intel fan or not, a lot of people must at least be interested! I sure am.
  • If you spent money on an alpha but are using the gcc compiler, you are just throwing your money away. Compaqs compiler is an absolute must to attain any sort of advertised performance with the alpha. On the other hand, if you just wanted your linux box to be different, then disregard this post.
  • Tom would then be entitled to get out the source...

    Which is what the post said. It was only a couple of sentences. You really couldn't be bothered to read the whole thing?


  • A program compiled to be backwards-compatible right down to the 386 will NOT be able to use SSE2 instructions, nor any other fancy bells and whistles(like 3Dnow! and plain 'ol SSE)

    Right, but would you like to encode a MPG-4 movie on a 386? You'd be twiddling you thumbs for a loooong time.

  • Whatever slashdot. (95% of) You people are hopeless.

    I can't believe I ever enjoyed this place. If this were a magazine I'd ask you to discontinue my subscription.

    seeya, suckers.

  • It's enough already. I'm not trolling here or trying to be flamebait but it's enough. You've published three, count 'em - three - conflicting reports out of Tom's Hardware praising, then criticizing then fawning (you word) over the P4. Well which is it? Why does Slashdot insist on posting redundant and conflicting articles? Just for the sake of posting something? If that's the case then expand the subject matter please.

    1. humor for the clinically insane []
  • oh please, enough with Tom's Hardware. Why would anyone believe anything published on Tom's? It used to be a good site, long time ago.
  • AMD would have got a flogging if the released this chip. The Cyrix & AMD socket 7 series were written off because of the lack of fpu performance. Now Intel releases a processer that is of poor quality and people are recompilling applications in a vain appempt to get some benefit from these $1000 dogs. Like you can recomplile all your windows applications, not. We linux people at least have the option but for all those people still stuck with windows these processer are going to suck chunks.

    BTW Does it amuse anyone else that all these benchmarks are being done with Windows 98. What sort or retarted moron would buy one of these puppies, then run Lose 98? Their performance must _realy_ suck under Lose 2000!
  • Whethor or not you tweak this or that, it is all irrelevant. If a chip costs 3 - 4 times as fast P3 or Athlon can not significantly outperform the current generation of processors REGARDLESS OF OPTIMIZATION, then it is not a worthwhile purchase. We shouldn't be saying, "Well, maybe it is faster in these areas, but we're not sure in these areas". We should be saying, "WOW! I've never seen Unreal at 250fps @1600x1200!" So, trying to figure out why the P4 is good at some things and bad at others is a waste of time. The fact that it doesn't obviously blow everything out of the water shows that Intel probably released the chip 6 months too early.
  • Nothing can improve *more* than linearly with clock speed (which I assume when you say "CPU power"). Linear increase is the upperbound. Often the increase is lower due to memory (and other) bottlenecks.

    When I said linear, I meant linear with a slope of 1. My mistake. (The slope is greater than 1 -- that is, a 10% increase in clock speed can give a slightly greater than 10% performance boost -- because of the constant overhead of Windows, device drivers, IDE CPU utilization, FlasK MPEG, etc., that is less relative to higher clock speeds.)

    The original purpose behind the P4 is to be able to crank up the clock speed. Looks like they have reached their goal and even increase the IPC by just a little. So it looks like this will be a win for Intel. When the Athlon reached its max clock speed, and the P4 continues to crank up its speed all the way up to 10Ghz, you'll see.

    The original idea was to introduce a new architecture. Intel claimed that the Willamette (sp?) core would greatly increase the speed per clock cycle -- in other words, a 1.5 GHz P4 would be as fast as a faster (in GHz) Athlon. That's what we're paying a premium for. What this shows is that in the best case scenario, when something is specfically recompiled for the P4 and it's a task that it performs best, the P4's advantage is almost statistically unimportant.

    Again, its only real advantage here is the clock speed versus the Athlon. True, the P4 is very scalable, but so is AMD's next processor core. The P3 was also very scalable, but AMD ended up beating out Intel when they introduced the Thunderbird core revision. The same thing's likely to happen here -- Intel has the technicaly advantage here that they can crank up the clock speed in the future, but by the time they actually do it, AMD will have its next core ready, and they'll be able to do it to.

  • Except that this isn't just a recompilation. If it would have been just a recompilation, then SSE2 wouldn't have gotten in there (that has to be written in by hand) and the menu options to enable SSE2 couldn't be there. Even if Intel is right that they made very few modifications, they still made some, so they have to follow the GPL.

    From the GPL:

    6. Each time you redistribute the Program (or any work based on the Program), the recipient automatically receives a license from the original licensor to copy, distribute or modify the Program subject to these terms and conditions. You may not impose any further restrictions on the recipients' exercise of the rights granted herein. You are not responsible for enforcing compliance by third parties to this License.

    Intel's violating this section of the GPL by distributing a work based on FlasK MPEG and telling Tom that he can't redistribute the binary. The GPL prohibits them from telling Tom what he can do with the binary, or even the source for that matter.

  • I hate to agree, because (a) I don't like Intel [I think they're out to screw the consumer as much as possibly, instead of providing good value] and (b) I just hate to see yet another freaking instruction set, forcing everyone and their dog to upgrade to overhyped, overpowered machines when, for 90% of people, a Pentium 120 would be just fine (wordprocessing, email, web browsing; not much good for games).

    AAAAAAAAnyway, I quote [The Register] []:

    "Reader John Welter of North West Group, a Canadian Geomatics firm specialising in orthophotography - stretching accurate photographs of the Earth's surface over elevation models of the same area - volunteered us some interesting information on his company's experiences with an early P4 system.

    When using the original code, a P4 system took a glacial 19 hours compared with just under 13 hours for a 933MHz PIII. But with code recompiled to use SSE2, the P4 galloped through the test in a shade over seven and a half hours.

    Outperforming Alpha
    "A P4 at 1.5Ghz is now faster when running optimised code then our Alpha production boxes by a sizable margin, where those same Alpha boxes outperformed all our P3 based systems.

    "Intel did not take the x87 FPU performance as a prime design goal in the P4. They focused on the SSE/SSE2 unit much more and made sacrifices to the X87 FPU side of things to gain more SSE2 performance. Some may argue this was a bad trade-off but the improvements they have managed on the SSE2 are very impressive.

    "Geomatics is extremely CPU intensive and pretty much 100 per cent bound by CPU performance. For this reason we obtained an early 1.5GHz P4 despite the inflated costs in an attempt to determine how much added performance it would give us in reducing our production times."


    The article then goes on to describe the sweet 'puter setup they use, describes how SSE/SSE2 are an advantage in this particular case, and describes how AMD also plans to support SSE/SSE2 and more. l for those who want to read it. Or click [this link] [].

  • Intel must have signed up for more advertising....
  • As agreed on the phone please don't distribute this version of flask to anybody else. We still haven't got hold of the author of Flask and we don't want to distribute this version without permission.

    That's what the email from Hans & Christian (@Intel) says. So to me it definately looks like they're planning on releasing the source-code. They're only a bit late.

  • by coneKtor ( 258168 )
    no one can be told how fast the pentium 4 is. you have to test it for yourself
  • Seems like it's the usual as with any other processor in some regards it benchmarks great and in others it shows far less regard. I think we should take this like any other innovation and just wait a bit. I've always been against buying stuff immediately on release as it always has kinks to work out (a la Playstation 2 :D)
    | aim: | bagel is back |
    | icq: | 158450 |
  • That's why they asked Tom not to release it.

    If he distributed it, then they would be obligated to provide the source.

    I think their goodwill is probably more important to Tom (and the community) in this case. If they default on that, then Tom might as well distribute the program.

    But until then, I'd rather he keep reviewing with their help.
    pb Reply or e-mail; don't vaguely moderate [].
  • Is that compiler optimisation is probably going to count for more and more as Intel wring every last bit of performance out of x86. Linux distros like Mandrake [], which currently comes in a version optimised for P5, could potentially have big performance benefits if you get the version compiled for your specific processor. Of course, the really good news is that Intel et al will have to take quite an active role in GCC development if they want to make their processor look great under free operating systems.

    Unfortunately, what it could potentially mean though is that if Intel were to do some sort of special deal with a proprietry OS maker (MS for example) they could make that OS run far faster than any others, simply because it'd be compiled with a better optimised compiler.

    -- Piracy is a vicitmless crime, like punching someone in the dark.
  • If these "benchmark" apps that people use were in Open Source form, people could recompile and optimize however they want. Every test Tom runs could show optimized-for-Athlon numbers, then optimized-for-Pentium numbers. That would be a great thing - people could then just recompile binaries on their systems (if they care) with their own CPU-specific flags. I wonder what the Shovelware from Redmond is compiled for? A 386? An original Pentium? There must be some spec that big vendors use - the average home PC processor?
  • First Tom decided one way Then Tom decided another day Now Tom is undecided where have i seen this before? ahh yes the US Presidential election. is tom from florida? :-)
  • That's exactly my point, I don't recall it has ever been so apparent. SSE2 obviously improves performance tremendously, and I hope people realize that if we drop "x86" all together, we could have a nice little leap in performance :)


    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)
  • A recompile can improve performance because it allows the code to make use of the SSE2 instructions. SSE2 is a kind of half-assed vector proccessing scheme. A conventional proccessor is a scaler processor. It performs one instruction on one datum. A vector processor can apply one instruction to 2, 4, etc. datums. Vector proccessing used to be relegated to the lofty-realm of supercomputing, but Intel intoduced it to the mainstream with MMX, then SSE, and now SSE2. It can really accelerate certain types of code. I say half-assed because of AltiVec, which is how vector processing should be added to a mainstream proccessor. 128 bit precision, baby.
  • First we have the presidential election all screwed up, now we can't even get a verdict on which processor to use! What is going on here?!
  • Just how complex benchmarking can get. It's horse for courses. In my home I have an SGI (200Mhz R5000), a few PC's <=233Mhz and a G3/400 Mac Notebook - And you know what,I couldn't tell you which one is best to use. I write this comment on the SGI which I "think" is my favourite machine but the G3/400 would surely beat it in any known benchmark (maybe short of gfx). The PC has a SCSI disk in it which would beat the Mac's ATA drive etc etc .. All benchmarks cannot possibly capture all aspects of a computer for valid comparison with another. You can only compare individual componentry (Disk , CPU , Memory, etc) - I am sure you could produce two benchmarks between Alpha and Intel P3/P4 that would show one beating the other and then the other way around (FP , Seti =),Integer OP's etc). Surely the only valid benchmark is the one that pertains to your usage?
  • How lame.
  • I haven't got time to read either the article
    or any of these comments. However compilers
    are getting more important with newer CPU
    designs. Also intel is always going to have
    the best compiler for x86 chips. And Compaq's
    compiler is the only one to consider for an
    Alpha. There should really be a campaign for
    the CPU vendors to opensource their compilers
    so that everyone can get the benefit and enhance
    them etc. Like what have they to loose? They
    will only sell more chips.
  • The problem with that argument is that you're comparing apples and oranges here. If Tom's Hardware had done a review like this on the Transmeta chip, then maybe you'd have a point. You may be able to make a point that ZDNet or whoever is biased against Transmeta, but when you stretch this conclusion to cover all reviewers and yell "CONSPIRACY!! FAVORITISM!!", it looks a bit ridiculous to say the least. Personally, I'm looking forward to reading a review by Tom on the Transmeta chip. I'm sure he'll give it a fair shake, just like he's done for Intel, AMD and Cyrix in the past.
  • Well, I'll take your word on it. I've never had a rectal surgeon get his head stuck up my ass(or have his head in my ass, even without it getting stuck). Thank you for sharing your experience, such that I might know exactly how much to avoid it in the future. :)


    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)
  • The P4 is pretty mediocre at current clock speeds, but a big advantage of it's design is it's apparent ability to handle high clock speeds that the P3 can't. So the P4 may not look too hot now, but Intel is expecting to get it up to 2GHz by the end of 2001, if they can do that they have a winner.

    Most people don't seem very adept at the whole "long-term thinking" thing, particularly with technology product releases .. but people should really try look ahead a little on this one.

    I pretty much agree with your third paragraph .. there just isn't the time to spend hand-crafting P4 assembler optimizations for 3d gfx (which I do for a living btw .. ) .. but the compiler should at least be making some attempt to use those instructions, if asked. Of course, we use MSVC, MS tends to not be very leading-edge in this regard.

  • Intel aren't distributing their modified version in any form, so they're not obliged to release the the source code. Read the GPL
  • Of course they read Slashdot - but do they actually listen to what the people here have to say, seriously? Maybe, but not as seriously as someone making a point to find a Intel engineer and then saying, "x86 sucks. Drop it, and make a real consumer-grade, non-x86 CPU. And I don't mean the Itanium, either."

    In the first case, it'll go into the statistics - 300,000 people for a new architecture, 460,000 against. In the second case, it'll be, "Well, I met this guy and he was really pissed off that we're still using x86. He seemed to know what he was talking about, and he understood the difficulties involved. However, he really thinks that the sacrifices would be worth it."

    Now, which will hold more weight?


    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)
  • I'm afraid that I have neither the time nor the bandwidth to explain. Go to Ace's Hardware, and read up on all the processor/architecture reviews. That's a good starting point.

    Fact is, Intel and AMD abandoned x86 to get real work done a long time ago. x86 is emulated on a modern processor, but at the hardware level. The core of the processor itself uses a different instruction set and format.

    And like most things, x86 is just behind the times. Like all technology, tradeoffs had to be made. x86 was introduced way back when with the 386. It was designed to solve a specific set of problems in a certain way. Today we have a different set of problems, that also need to be solved in a different way. It's not that x86 was never good - it was very good at the time, and as evidenced by its long use, it had quite a bit of life in it.

    Unfortunatly, the great strides that all the major chip manufacturer go through to pander to the x86 instruction set really cuts down on performance :(


    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)
  • Legally, yes. But notice how it was a few Intel hackers who politely asked Tom not to redistribute it, and not a few Intel lawyers demanding it.
  • by ion++ ( 134665 ) on Sunday November 26, 2000 @12:33AM (#601702)
    The real problem is that the cpu makers doesnt give away a compiler for their cpu, or work with
    free compilers to create good support for their cpu. Why arent these optimisations in gcc ??

    If regular users can not get hold of binaries compiled with good compilers, or the good compilers to compile their own stuff, then their real life usage of the cpu will look worse than the one in the review. The reviews shouldnt be done with special equipment, that being hardware or software, or with the aid of engineers that knows one side only. It should be done with standart equipment so we, the normal users would know what to expect.

    Intel, AMD, and other cpu makers, that being x86 or not, give away the compilers, and see your hardware shine, or help GCC getting good support for your CPU, which we, the normal users can benefit from.

    ps: there might be other free compilers than GCC
  • x86 comprises the instruction set used on x86 processors such as AMDs and Intels. It hasn't been abandoned. I don't see how using 'mov' instead of 'move' found on another processor makes a big difference. Sure, there are some irregularities, for instance 'imul' only takes one argument and assumes the source and dest. to be AX, but they're understandable when you know the history behind the processor (in this case, Intel literally couldn't fit a source operand into the instruction). I really don't see how changing 'x86' is going to do a bit of good. No matter what instruction set you use, all you're going to end up doing is juggling registers using different ops. Maybe I'm just missing something, maybe by 'x86' you mean the whole architecture and not the instruction set (though you say SSE2 will move us away from 'x86', so I assume you mean the instruction set). If you do mean the architecture, then, yes, we could do better, but I'm not going to sacrifice compatibility just because someone thinks they know a 'cleaner' way of running a pc. Right now we see one solution to extending the somewhat limited interrupt space of the x86 architecture, that solution being USB. If you mean changing the instruction set, well, I'd like to know a good reason why. You said x86 was introduced in the 386 line, no, x86 is anything 'x86' (thus the reasoning behind the 'x' :P), this included the 8086 (not the 80186, really, though because intel decided to break compatiblity, and got screwed because of it). Also, you said that it was meant to solve certain problems. I'm not real sure how an instruction set 'solves' problems. All a processor can do is shuttle around bits of data, it has no means to even recognize code from data (ok, if you use segments correctly it can, but most people set up two overlapping 4 GB segments for their code and data when doing OS programming). A microprocessor is designed to be generic in nature, thus overhauling the instruction set for a specific purpose defeats the original purpose behind the pc microprocessor. What Intel and AMD are doing with extensions such as SSE2, MMX, 3DNow! and the like are the right step to take to optimize certain processes which occur on the processor, while at the same time keeping the processor generic. Maybe if I knew what kind of 'new' problems we're facing now that would warrant an overhaul of a perfectly good instruction set I'd be able to better udnerstand your reasoning. (or maybe you meant the architecture, which in that case, everything I just said was moot)

  • Ok. You benchmark the Transmeta, a neat processor . It is light years ahead of the "magical underclocker" technology from Intel (slowstep ;)). hey are targetting lower power consumption in your laptop while in Word and other apps (which Win9x doesn't do because of not HLTing the processor). It's designed so that the only significant draw of power is the LCD (the HD spins down while idle, and the proc is self reconfiguring for greater efficiency).

    Naturally, since it's not targetting performance, it benchmarks poorly. Do they (the various Quake 3 monkeys) rerun the benchmarks? No.

    The Pentium IV comes out. People plug 'em in, benchmark them. They also suck. They benchmark them again, showing the suck by a larger margin. Then they benchmark them again, showing it's actually not such a bad suckage after all.

    Isn't that just a bit of Intel favouritism?

  • I think intel asked Tom not to re-distribute it, not required him to sign agreements to not do so, or anything similar to that.

    Here's a direct quote:

    "As agreed on the phone please don't distribute this version of flask to anybody else. We still haven't got hold on the author of Flask and we don't want to distribute this version without permission."

    It's a little fuzzy, but it seems to me that Tom only agreed, but not necessarialy feels legally bound. I guess it all depends on what was said on the phone. I'd like to think intel only said "we don't like the code as it stands right now, so please understand that we wouldn't like to see it distributed", and that Tom said "OK.".
  • Intel claimed that the Willamette (sp?) core would greatly increase the speed per clock cycle

    Increasing IPC is a direct tradeoff to increasing clock rate (or the ability to scale the clock rate up.) You may be able to increase both to a certain extent through tricks here and there, but at the end, one of it will have to give in to the other. Intel's strategy (and pretty much most CPU designers) is to increase clock rate, because that is easier to do than increasing IPC. Increasing clock rate is pretty much a brute force strategy to increase performance, and increasing IPC actually need some brains. Intel managed to only squeeze out a little more out of the P4 by using the trace cache and some other stuff like double pumping their ALUs. No other processor in existence at this day have a trace cache. Before the P4, the trace cache was purely only a concept on paper. Also, I think they were smart to optimize for the SSE2 and not for x87 FPU instructions. Why? x87 instructions is legacy code, and is slow by nature. Why not design a separate floating point instruction set that is not "hacked on". Remember that the x87 is pretty much a hack on top of the 386 (integrated on the 486), and is pretty slow compare to how floating point is done in RISC processors like the Alpha or Sparc. With the SSE, Intel is planning to become more on par with those RISC processors (though it will never be as good IMHO). The Athlon may execute x87 instructions well, but let me assure you, it is still not as fast as it could be. And getting rid of x87 is a much better idea, but I doubt that will happen since everyone wants backward compatibility. I guess this is something we all have to live with for the rest of x86's life and our lives.

    AMD ended up beating out Intel when they introduced the Thunderbird core revision

    I wouldn't say they beat out Intel. AMD still only have 20% of the market share. Anyway you look at it, Intel is still the winner. I'm not being biased, but being objective. AMD may have a better processor, but at the end, Intel still sold more processors. This may be attributed more to their marketing prowess and ppl's brand name loyalty rather than any great engineering feats. But still that statement you made is wrong... The Athlon didn't win... it lost in fact. If it had won, AMD would have more than 50% market share. Of course, AMD did reach the goals they set, so it's a win from their and their supporter's perspective.

    I wonder who's gonna read this since this thread/topic is already pretty old.
  • by Anonymous Coward
    Dropping backwards compatibility would solve a ton of technical problems for the Intel processor family... but then there'd be a whole raft of other problems to consider--like the performance hit of emulating x86 instructions. Essentially, Intel doesn't want to take the continuity hit and loss of momentum that plagued the PowerPC Macintoshes--that little speedbump on the road split the Mac camp in two; no longer was there a clear continuous architecture from 128K through the latest and greatest. While the Mac always had architectural integrity (or "proprietary system" to you Mac detractors) as a selling point, the change would be a 'hit' to any company trying to make such a big break with such a big installed base.

    See all the discussions of that set of problems when Itanium finally ships.

  • 1. I don't know. All I know is that he has enough repute to be linked to on slashdot, which should mean something.

    2. Yes. I spoke a little too quickly and it came out a little ambiguous. They optimized the decoder. It's possible that the optimizations, or even just a simple retargeted compile could cause difference in the raw output to the encoder. This would cause, of course, the output file to change. Same thing would be true if Tom was right and they actually had retargeted/optimized the encoder, which they didn't. Such an encoder could very well produce different output. So Tom's assertion that the output file would be "obviously" identical is clueless. Since Tom has said that the output files were identical, we can unsafely assume that the intel'ized decoders didn't actually cause any difference in the intermediate data. Of course, given his display of stupidity, I wouldn't be surprised if he didn't even do a checksum. I'm not an AVI expert, but I wouldn't be surprised if the filesizes came out identical due to padding or hard bitrate limits on the encoder. All issues which never appear to even have crossed his mind.

    Or seem to be relevant to slashdot moderators.

  • by Anonymous Coward
    If you read the previous Tom's Hardware articles, none of them gave the new P7 architecture a shining review as you would imply. The first post said that it's performance was actually relatively poor. It seemed to perform really well only on Quake 3. He did say that new optimization could possibly help out performance a lot. But, still, recommended a high end Thunderbird instead as a more balanced system. The second article showed that the mpeg compression performance could possibly be even worse than the first showed. The third was possibly the most positive of all. I think Tom is doing an excellent job considering the complexity of the issue. I would be extremely skeptical of anyone reviewing something this difficult to benchamrk with any less deliberation than Mr. Pabst has shown us.
  • It reminds me of when software and hardware was going from the 486 level to the 586.

    For most regular software, it did not mean squat. It only mattered for software that was specifically set up to take advantadge of the hardware features.

    There were certain Photoshop filters that were fantastic at certain settings, but choked when you used others. This is typical of new feature.

    I was always amused by comparing processors running at the same clock speed. Typically, when you do that, the gain in performance is usually about 15% to 25% (YMMV) before you add in the differance in clock speed. All too often the clock speed is a huge factor in the performance boost, not just the design changes.

    I think I'll go have another beer....

  • Is there an easy way to make use of the fact that both MPEG-2 and MPEG-4 use square blocks of pixels? That both use the DC transform? Maybe a dedicated MPEG-2 to MPEG-4 converter wouldn't have to do IDCT first followed by FDCT? Is it possible to work on quantized coefficients instead of pixels? Can information from the MPEG-2 encoding process (e.g. the direction of movement between two frames) be re-used? AFAIK, a lot of MPEG-2 encoding time goes into finding out what parts of a frame are moved in what direction, trying out various rotation angles and contrast / brightness adjustments... It seems to be a waste to throw that information away.

    OTOH, although I know JPEG's internals pretty well, I'm not sure about MPEG ;-)
  • Come on, of all the complaints Tom brought up in his review, why do you think they picked the MPEG4 compression one? Simple: pick one task the P4's good at and let it speak for the processor as a whole.

    Before anybody shoots this down as anti-Intel flamebait, think about it. Why else would Intel risk the bad publicity of using an "illegal" program for benchmarking purposes? I mean, why not recompile gcc or OpenUT for P4 optimizations? Because they knew the P4 would do well at MPEG4 compression.

    That's not to say that it would do any better than the Athlon. Some of the speed difference could be attributed to the P4's higher clock speed, but a lot of it is because of the Intel-compiled FlasK's SSE2 support. Had AMD recompiled FlasK to support the Athlon's 3DNow+ MPEG extensions, I'm sure the Athlon would have gained a lot of ground too.

    The fact that it made such massive gains on all processors just speaks to that theory. I mean, think about it -- if Intel's compiler could magically make your average program run twice as fast, like it did for FlasK MPEG, developers would be lining up outside of Intel's offices for copies. This means that either (a) Intel changed the internals of the program to "cheat" by lowering quality and skip the parts that the P4 did poorly at, or (b) FlasK MPEG is a special case.

    And the numbers aren't quite as amazing as Tom suggests. If you take the x87 version and assume for simplification that FPS scales up linearly as MHz increases (which is not true -- if anything, something as CPU intensive as MPEG compression will probably improve significantly better than linearly with CPU power) then virtually all of the speed is accounted for by the fact that the P4 is 1.5 GHz -- the Athlon gets 9.28 frames per GHz, whereas the P4 gets 9.35 frames per GHz.

    I know this is a terrible method for comparing CPU power, but it shows the basic idea here -- that most of the P4's advantage is due to its speed in MHz, not the architecture. While this is all well and good for Intel if all programs are recompiled overnight for the P4 and Intel can continue to out-clock AMD (given Intel's recent history that's not too likely), in the real world AMD still has the advantage.

    So what about those SSE2 scores? Tom glosses over the fact that all but the lowest score are for lower-quality encoding. So yay, we're getting 22.85 FPS . . . in the lowest-quality setting. The high-quality SSE2 setting gives a not-so-stunning 4 FPS boost over the x87 version, which isn't that great of a boost, considering that an Intel engineer hand-optimized the program to work better with P4s. (Well, Intel claims they just recompiled it. But I have yet to see a compiler that adds SSE2 instructions on its own, let alone one that can add new options to a dialog box with SSE2 features.)

    At any rate, none of this helps the Office benchmarks, or the UT benchmarks, or the 3DS Max benchmarks . . . or any other benchmark that reflects performance computer users might get except in special cases. Not every computer user can use exclusively GPL software that he/she can recompile at will to support his/her new processor. Besides, that would take a while, since the gcc scores aren't very good either. :)

    (And am I the only one that noticed that Intel modified a GPLed program and refused to distribute the source or allow Tom to redistribute it? Isn't that illegal?)

  • Integer performance is a lot more important for compilers, etc rather than floating point.

    OSes are unlikely to benefit much from SSE optimization. And Athlon is quite a bit faster in integer performance.

    It seems that the only well-designed unit in P4 is its SSE engine. On the other hand it might be more related to high latency of the Rambus memore than to processor design.

  • I did read the article. The performance increase for the Athlon and P3 is just standard compiler optimization, but I wanted to explain what SSE2 was, and Intel did change the code so that it took advantage of SSE2.
  • At some point, recompilation doesn't cut it. Yes, recompilation produced almost a four fold increase in efficiency, bringing the Pentium IV to a level comparable with the Althon, (which benefited from a (nearly) ncrease in efficiency), the SSE2 improvement is even more dramatic. And one can't simply recompile to use SSE2.

  • That's the problem with real world benchmarks. MPEG-4 encoding is associated (by some) with piracy. On the other hand, the publication of (original) digital video might well become a common pastime in the next few years. In the mean time, it does provide a sort of real world application that (some) can appreciate.

    I'm not sure that SPEC2000 is an appropriate solution. Most people don't care about the performance of a "quantum-chromodynamics" simulation, and are not involved in compuutational fluid dynamics. The integer simulations are a little closer to home (word processing, chess playing, perl...) but unless your "real world" approximates the "real world" the benchmarks are trying to simulate, the results of such benchmarks are difficult to appreciate.

    I supect that to many people, a Quake/Unreal benchmark is much more valuable than SPEC2000 results.
  • All things Multimedia. HDTV will offer much higher resolution than DVD, at 1080i, the highest spec, I saw claims 6 months ago that the then highest end x86 chips could not do software decoding. Also, look at mp3 ripping and digital camera usage. These are all things that Joe Consumer are interested in. Given the incredible advances in digital camera technology, a multimegapixel camera that stores hundreds of imgaes may need significant cpu horsepower to convert to jpg or png, or just edit.

    DVDs have been around for about 3 years now, and yet DVD decoding chips aren't standardized on motherboards. We can expect the same for HDTV. Software decoding is going to remain pretty popular, as DVD + big mhz/ghz sells in CompUSA whereas selling 400mhz + decoder card = educating consumers = good luck.
  • Does you opinion affect tides and harvests also?
  • Try Mathematica and Matlab! You will see the different. I am sure AMD cpus will not run as good at the same CPU speed.

    i. There is no spoon.

  • It seems a little too convient to me that Tom posts a review totally slamming the P4 and then re-does it using a "new compiler" and all of a sudden the p4 looks better? Sounds like another dirty trick that Intel is trying to pull... They know we read these sites and it's important to them that they get good reviews or else they don't stand a chance in hell of getting sales. AMD on the other hand is just purely into the market to please customers and produce quality hardware. Look at the new Athlon's and the 760i chipset! Intel isn't going to increase their bus speed to 266mhz until long after their 64bit processors hit the market. AMD on the other hand has been running 200 mhz bus speed eversince their 500's. Intel is just using their name to maintain a foothold on their market share but people are starting to wakeup and realize that they are full of crap and it's running out thier ears. Even OEM's like Gateway see the light and now as a result, Gateway computers are 10 times better than they ever were! I personally wouldn't touch an OEM machine with a 1000' pole but if I had to, I'd pick one that atleast uses amd and good solid hardware. Just my 2 sense, like it really matters anyhow but what the heck, maybe some of you can see Intel for what they really are.
  • I seem to be missing something here. Would you please explain why x86 "sucks"?

  • If I am buying a new machine for gaming and DVD, I would probably choose AMD Athlon or something. But after all I will still buy a Intel P4 since it's good for business applications and development tools. Also mathematical and scientific stuff.

    i. There is no spoon.

  • I have been a long time skeptic of any CISC arch. Lately, instead of people designing programs around a computer, Intel has been designing a computer around the code. The Pentium 4 cores were being designed WHILE DivX got its time in the spotlight. SSE2 was designed to work on DivX, and anything else Intel predicted would become big during the Pentium 4's lifetime. When the next new wide spread method of compresson comes out, we will have to upgrade to Pentium 5s!

    RISC archs on the other hand seem to be geared more twards a general purpose processor, not just what Intel thinks computing should be like. When MP3s became big, it was a nightmare to create a fast iDCT on the x86 archs. On RISCs is was(and still is), a piece of cake. Simple programs take forever to execute on current X86s, but their RISC counterparts can blow through anything with almost no effort. RISC processors were designed for "computers", while CISC processors were designed mostly for machine controllers(anyone remember the stop-light tutorial on the early Intel CPUs!) and other things doing predictible repeating tasks.

    Specialized vector processors are the other main type(in my eyes) of processor on the market. They are extremely single purpose, and are extremely fast. Someone could create a DivX-on-a-chip that could process a frame per clock cycle, run it at 10MHz, and blow every other CPU out of the water with >"real time" multichannel DivX encoding on one chip! This is where I am afraid Intel is going. Their chips can still run normal "computer" instructions at a small fraction of the speed of a RISC cpu at half the clock speed, but can run at an exceptable speed with most "consumer" low-grade multimedia(Don't get me started on my MMX is evil speech that everyone hears me give).

    I would really like to see this SSE2 code that was "hacked" into FlasK. Even more interesting might be the dissassembled code that Intel's compiler produced. For all I know Intel could have done something as simple as recompiling FlasK in their standardized compiler, instead of something Microsoft produced.

    If everyone was running LinuxPPC \ Linux/Alpha I would have no need to write this ;)

    BTW - Anyone working on a Linux-FPGA project?
    Recompile your kernel and write it to your processors microcode-ROM. hehehe
  • Why Win98? Because many, many people who buy high-end systems tend to be hardcore gamers. Most hardcore gamers don't play under Win2K, they play under Win98. Thus, Win98 is used. (Also, it's the most popular OS)
  • No its stupid he isnt distrubiting the binary either there probably hiding something maybe there isnt really a flaskmpeg optomized?
  • It IS interesting to note that the recompiled version also helped the P3 and Athlon. It does seem to indicate the original was not compiled optimally.

    If you divide the benchmarks by the clock rate, you get a more objective view of the processors, independant of clock rate. By that measure, the P4 is turning out about .00935 per cycle, while the Athlon is just behind it at .00928, and the P3 behind at .00803. This makes sense - the P4 isn't really much faster than the existing technology (Athlon), but it does allow much higher clock speeds, which was Intel's goal all along. Bully for them, they got what they aimed for.

    The small problem of course is that you don't get any of that bang with existing apps, which will undoubtably hurt P4 sales for a while. Strictly looking at bang-for-the-buck, the P4 is a poor choice at the current price points, but most new processors are. If AMD can get the Palomino (smaller,faster Athlon) out the door quickly, with SMP support, they could take a bite out of the server market that would have Intel rubbing their bottoms for some time.

  • This isn't necessarily EVER going to be any good at rendering. It's nowhere near general-purpose enough. Ironically, people doing heavy duty imageprocessing or rendering or raytracing are the ones who need a superfast processor most- but I doubt the P4 will _ever_ be any good at that. There will be some ultrafast codecs and that's it- and that's a really tiny percentage of what a rendering farm does- and the people working in these fields _never_ have enough time or enough CPU, making it not an abstract problem.

    If anybody tries to build render farms off P4s they will literally go out of business due to missing deadlines. The P4 pipeline is too deep, the whole thing is geared towards flashy performance on consumer codecs and it's not general-purpose enough to perform realworld EFX tasks to deadlines. The high end market will be avoiding this one- it's just too easy to push it out of its 'high performance zone' and make it bog down.

  • Just out of curiousity -- since you own so much stock in AMD, and love their processors how much, how do you feel about their use of SSE2 in the Hammer series of processors? They are using it, you know. They are licensing from Intel.
  • Dude, what the fuck good would a "Linux" FPGA do for anyone? Ohhh, I get DSP performance out of my kernel? Big fucking deal. Its the 3D rendering or Quake3 I want DSP performance out of. And more to the guts of your post. Most modern RISC processors out now have a large amount of specialization either in their instructions or processing units. MIPS binaries don't run on SPARCs now do they? RISC is a good chip architecture but it is no reason to thrash CISC merely because one CISC implimenter is adding beacoup instructions to their chips. Extending your instructions is a good way for people to get high level optimization out of a processor. If certain functions that are popular among an entire class of products are turned into chip level instructions, anyone writing code in that catagory gets to replace a large chunk of code with a single instruction. This is good for programmers as they have finite time to complete a project and writing/debugging a chuck of code takes alot longer than using some hardware optimization.
  • by Fervent ( 178271 ) on Sunday November 26, 2000 @01:58AM (#601734)
    I found the Athlon vs. P4 test, using the recompiled ap, to be quite interesting. The cost performance ratio isn't nearly as great as I thought it would be. True, the P4 1.5 ghz performs better, but not a whole lot better than the Athlon 1.2 ghz. See the picture [].

    I'm not trying to knock Intel perse. My main machine is a P3 (Dell laptop, runs like a dream). But you have to wonder if the cost warrants, in this case, the extra 3 fps in compression.

  • My apologies. Here's the link [].
  • Intel, AMD, and other cpu makers, that being x86 or not, give away the compilers, and see your hardware shine, or help GCC getting good support for your CPU, which we, the normal users can benefit from.

    Intel does offer their VTune compilers for sale, as they must in order to legally use them in the SPEC benchmarks where they perform so well. Unfortunately, there are widespread complaints and accusations that they are buggy and tempermental and fail to compile much code that works just fine with gcc, VS, etc. The charge that Intel gets its SPEC scores with compilers which are so optimized that they aren't robust enough for every day use has tarnished Intel's very impressive SPEC scores among some. I haven't ever tried to use VTune so I can't comment as to whether this is FUD or not. It is worth noting that VTune is much much faster than anything else in SPEC, yet rarely used in practice, so there must be something wrong with it.

    But Intel does also help other compiler makers incorporate optimizations. I know they specifically work with Cygnus to optimize gcc, and would assume they do the same with MS. AMD also works with compiler makers to get support for 3DNow. (For market reasons--i.e. they will always have smaller market share--AMD designed the Athlon to perform well on P3-optimized code, and thus there is not so much to be gained by including K7 optimizations over and above 3DNow. The P4, on the other hand, is very different from both of them and needs a recompile to perform well, as these numbers demonstrate.)
  • by Greyfox ( 87712 ) on Sunday November 26, 2000 @02:03AM (#601740) Homepage Journal
    Intel's made damn sure that a highly optimized compiler is available for the IA64 and if I recall correctly they've also put a bit of work into tweaking gcc/pgcc to make sure it optimizes pentium code very well.

    This is a danger to AMD, which traditionally has had very little to do in this arena. A properly optimizing compiler can make a huge difference and they need every edge they can get to stay on top. Intel understands this weakness of AMD's and will exploit it.

  • One of the staed goals of the P4 though is to really whomp ass in terms of clock speed which is something I think it can be agreed thay've done. They needed to put something out that could do two things 1) make the home PC buying drool and drivel over the word gigahertz and 2) do things people wanted to do quickly. If you have a processor that really flies at video rendering and sound processing Dell and Gateway stick FireWire ports in it and classify it a multimedia editing PC to get eyes turned away from iMacs. Non-techies see the numerical difference between 1.2 and 1.5GHz and their eyesbrows go up because they understand that the higher number MUST be better. You're very correct though in saying render farms of these things will sink anyone. raytracing and picture editing use algorithms over and over again and are best kept in the cache until served with a cold soup. Coppermines and Willamettes have tiny caches that make it difficult to keep more complex (read larger) algorithms in the L2. When you buy an SGI Onyx system the processors have pretty heavy cache sizes which facilitate repeticious functions like this. If businesses want/need "high-performace" they ought to be buying Xeons if they want to stick with IA32 or maybe the UltraSPARC3. Quality hardware + quality software == justifiable cost.
  • The software they used was open source, dude.
  • So onece again we receive the ultimate proof on what benchmarks count for... Whatever independent or dependent the testers are, they can fall into crass errors, if they risk their final word without weighting all factors. And even doing that does not save them from getting burned by some nightrunning hacker, a last minute adding or a dumb tool.

    Sincerly I think that we have enough of these benchmark judgements. Playing the game of "the judge" is what benchamark tests should get rid of. Frankly only after a set of benchmarks is run for some time and all levels tested/contacted/patched/retested, then people should take judgement. Until then no benchmark can be taken as a veredict. So, everytime someone tells about things like Linux suxx and everything else rulez, first check if the penguin horde shrinks, then read for a month ZDNet without missing a day, then check the mass media, benchmark sites, testers, then check if freshmeat's submissions lowered, then check what your friends/colleagues/neighbors say. If everyone says that Linux still suxx then you may take for granted the first benchmark. If not then the guys have gotten a check from M$. But until then don't forget to recompile the kernel so that it fits what you really have on your comp. That's the best test benchmark you may do for yourself...
  • Like you can recomplile all your windows applications, not.

    You don't need to.

    For 90% of users out there who need the processing power at all, the only thing that matters is the graphics driver, because it's games that are sucking up the CPU time. Graphics driver upgrades are released fairly regularly.

    For the rest, it's MPEG CODECs. I'm sure if your favourite CODEC's site posted an update that ran 50% faster, you'd download it; thus, I don't think upgrading it will be a problem.

    The (relative) handful of people doing heavy-duty image processing or rendering will likewise be upgrading to the next version of their software package at some point, which will contain SSE2 code.

    The OS itself doesn't need a recompile. Neither do your office applications. Where is the vast pile of software that needs to be recompiled?
  • by Silvers ( 196372 ) on Saturday November 25, 2000 @09:06PM (#601745)
    nVidia has a press release saying the Detonator3 drivers are P4 and SSE2 optimized. That might account for the performance difference.

    Urls: html

  • by Chris Johnson ( 580 ) on Saturday November 25, 2000 @09:41PM (#601747) Homepage Journal
    This is a very good point. _Too_ much of a disparity should be considered a red flag, a warning sign. Apparently the P4's performance is really brittle- if you pick just the right task to test it on it can scream, but hit it with something it doesn't like (something that beats on the excessively deep pipeline or doesn't logically break down into simple SSE stuff) and it bogs, it's an absolute pig.

    Logically, then, we must all avoid attempting any tasks that the P4 doesn't like ;P

    ...ok, maybe not. How about some benches on tasks like POV-Ray scenes, SQL operations, kernel compiles? Some of these map more closely to the sort of tasks you'd get if (for instance) you were an EFX house and wrote your own software to do some heinous particle system rendering task. Those are the people who need _real_ CPU speed, and it's really questionable whether P4 is going to be any use to them at all. When you're doing that sort of stuff you have no time to fuss around with wizzy CISC operations- you're writing hopefully bugfree code and running it against a deadline to produce footage that is needed right away if not sooner, and you frankly don't care how Flask does since you aren't using it. You're running fairly general purpose code- a LOT of it- and live or die by your ability to (a) write it as needed and (b) run it. There's no room for "it's 70% as fast as an Alpha but if we spend another month optimising it for SSE we might be able to run it three times as fast, or it might go splat again if we double the number of scene elements or change a line!" no no no...

  • In the real world, how many of us need this kind of speed? for game..raw CPU speed doesnt factor in that much (unless, god forbid you use software rendering)..for web can do that just fine on a 90Mhz P1 (fine, you can take a 200Mhz P2 so java and flash will be nice and purdy) for DVDs..get a f*cking stand alone DVD player and a TV for god sakes...or at least a hardware based MPEG decoder

    but, you people love big numbers..

  • by Kris Warkentin ( 15136 ) on Saturday November 25, 2000 @07:51PM (#601757) Homepage
    is that we've got so many different CPU's and branches within types (how many x86 extensions are there now?) that compiler optimizations don't come close to keeping up. It's all well and good to have these fancy chips but if you don't have compilers to take advantage of the big registers and special opcodes, what good does it do? Linux is ported to lots of different architectures but doesn't necessarily take full advantage of them...take a look at linux distros for UltraSparc - they're still basically 32bit in userland. *sigh* Competition may be good for consumers but it's not so great for developers who have to support 10 million diverging platforms.
  • Sure, it's nice to compile the kernel faster, but how many people compile the kernel, and how often? SQL? Just throw in a mainframe.

    But how about 3d games? That's the mass market where performance really matters. Invent a better, cheaper, BSP-tree processor and you'll corner the CPU market.

  • What a wonderful example of "keep running the benchmark until it looks fast." Or, consider this snippet of dialog:

    Readers: Tom, how fast is the new Pentium 4?

    Tom: How fast do you want it to be?

    Call me a cynic, but I doubt that even this obvious debacle will convince anyone of the pointlessness of benchmarks. (Five years of Apple saying the PPC is "twice as fast as the Pentium" sure hasn't.) Let's go back to using MIPS, at least everyone knew that stood for "Meaningless Indication of Processor Speed"
  • by dbarclay10 ( 70443 ) on Saturday November 25, 2000 @07:58PM (#601774)
    First of all, I'd like to congratulate the author of Flask(the MPEG4 encoder used in the benchmark). You can't buy publicity like this, and I bet your app just got a whole lot better ;)

    I also think we should take note of something. *THIS* is the promise of moving to a new architecture, beyond x86. A program compiled to be backwards-compatible right down to the 386 will NOT be able to use SSE2 instructions, nor any other fancy bells and whistles(like 3Dnow! and plain 'ol SSE). At least, as far as I can tell(I think pgcc can use more advanced instructions and still run on older CPUs).

    In essence, Intel is moving away from x86, albeit slowly and painfully. SSE2 is obviously a good technology, but an incompatible one. Programs using SSE2 instructions will need those instructions available when they run, elsewise bad things happen. But what a gain!

    If ever you get the chance to talk to anyone from Intel, say that you'd like to see more of this.


    Barclay family motto:
    Aut agere aut mori.
    (Either action or death.)
  • by EoRaptor ( 4083 ) on Saturday November 25, 2000 @10:14PM (#601776)

    FlaskMPEG ( is a project written by Alberto Vigata and whose source code is available under the GPL

    (As an aside, Alberto has been extremely busy of late, and the project has gone a little stale, but it is by no means abandoned, and he has collaberated with several authors to forward the development of FlaskMPEG, though it is slow going)

    Intel has taken this source code, produced a modified binary, and distributed that binary to a third party (Dr. Thomas Pabst). Now, the question is, where is the source code? They are obligated under the terms of the GPL to release it, and so far they havene't. Additionaly, they hint that they don't want it distributed, by asking Dr. Pabst not to make the recompiled version of FlaskMPEG available. Is this a violation of the GPL? Probably. Will Intel get away with it? I'd like to see them not, but they probably will.

    I'm surprised no one commented on this before, Slasdot goers a usually more on the up than this.

    P.S. The DiVX codec is *not* SSE/SEE2 or 3DNow! optimized, though it does have MMX optimizations. How do I know this for sure? Because DiVX is just a copy of the Microsoft MPEG4v3 codec that has been modified with an assembler/debugger to allow the playback of MPEG4v3 streams inside an AVI, and to stamp streams it creates with the FourCC code of DIV3 instead of MPG4. It wouldn't have been needed at all if Microsoft hadn't artificially restricted the codec from creating or playing back AVI files, instead tying it to the ASF format, and therefor to a Windows only platform. Can we say 'Embrace and Extend (just enough to break compatibility)'? I knew you could...

  • ...if anything, something as CPU intensive as MPEG compression will probably improve significantly better than linearly with CPU power)

    Nothing can improve *more* than linearly with clock speed (which I assume when you say "CPU power"). Linear increase is the upperbound. Often the increase is lower due to memory (and other) bottlenecks.

    the Athlon gets 9.28 frames per GHz, whereas the P4 gets 9.35 frames per GHz.
    but it shows the basic idea here -- that most of the P4's advantage is due to its speed in MHz, not the architecture

    The P4 is faster (but not by much), normalized, as you said yourself, so why is the advantage due to its speed in Mhz? You defeated your own argument.

    The original purpose behind the P4 is to be able to crank up the clock speed. Looks like they have reached their goal and even increase the IPC by just a little. So it looks like this will be a win for Intel. When the Athlon reached its max clock speed, and the P4 continues to crank up its speed all the way up to 10Ghz, you'll see.
  • by smack_attack ( 171144 ) on Saturday November 25, 2000 @08:00PM (#601781) Homepage
    AMD was notably slower in New Mexico as well, in fact it was really too close to call.

    Meahwhile, AMD officials were quick to point out that the tests had already proven the "will of the program", which showed AMD ahead by a large margin. Intel was smug as they responded "We will wait for the final benchmark, which will show that we are the faster x86".

    Transmeta was unavailable for comment.

"The number of Unix installations has grown to 10, with more expected." -- The Unix Programmer's Manual, 2nd Edition, June, 1972