Follow Slashdot stories on Twitter

 



Forgot your password?
typodupeerror
×
Intel

Intel Accused of Inflating Over 2,600 CPU Benchmark Results (pcworld.com) 47

An anonymous reader shared this report from PCWorld: The Standard Performance Evaluation Corporation, better known as SPEC, has invalidated over 2600 of its own results testing Xeon processors in the 2022 and 2023 version of its popular industrial SPEC CPU 2017 test. After investigating, SPEC found that Intel had used compilers that were, quote, "performing a compilation that specifically improves the performance of the 523.xalancbmk_r / 623.xalancbmk_s benchmarks using a priori knowledge of the SPEC code and dataset to perform a transformation that has narrow applicability."

In layman's terms, SPEC is accusing Intel of optimizing the compiler specifically for its benchmark, which means the results weren't indicative of how end users could expect to see performance in the real world. Intel's custom compiler might have been inflating the relevant results of the SPEC test by up to 9%...

Slightly newer versions of the compilers used in the latest industrial Xeon processors, the 5th-gen Emerald Rapids series, do not use these allegedly performance-enhancing APIs. I'll point out that both the Xeon processors and the SPEC 2017 test are some high-level hardware meant for "big iron" industrial and educational applications, and aren't especially relevant for the consumer market we typically cover.

More info at ServeTheHome, Phoronix, and Tom's Hardware.
This discussion has been archived. No new comments can be posted.

Intel Accused of Inflating Over 2,600 CPU Benchmark Results

Comments Filter:
  • 2600 counts of fraud (Score:4, Interesting)

    by Njovich ( 553857 ) on Saturday February 17, 2024 @04:40PM (#64248080)

    Remember when companies like Boeing and Intel were companies with a proud engineering culture?

    Yeah me neither, that was a long time ago. Intel has been caught doing this type of stuff for decades.

    • If you count each of those 2600 inflations as "fraud," you'd have to count every advertisement ever, as fraud.

  • board member one: How are we going to beat AMD?
    board member two: We are going to cheat and let our brand name make up for any ties
    board member three: That right we are going to cheat and let our Bonus go high!

    • board member one: How are we going to beat AMD? board member two: We are going to cheat and let our brand name make up for any ties board member three: That right we are going to cheat and let our Bonus go high!

      Optimized or not, if all it takes to feed millions in revenue directly into executive pockets is selling a “benchmark” report full of shit few would ever replicate, then I put the blame more on the suckers falling for executive lies. It’s not that they’re brilliant at sales. It’s that most consumers are that gullible.

  • How aren't the test results relevant if the core designs are basically the same as in desktop CPUs? Only the number of cores in the SoC and the interface are different.

    • The biggest difference is access to more RAM for the XEONs compared to the "desktop" version of the CPUs, some of the benchmarks in the SPEC suit simply benefit from more available RAM. (Also from faster RAM, but that shouldn't matter much within the same core generation.)

      But let's not forget: SPEC numbers only tell you something about a specific computer model with specific hardware and a specific compiler and settings. That's basically why they invalidated the results for 2500 machines and not for a coup

  • by kenh ( 9056 )

    So let me get this straight...

    SPEC designed a benchmark to represent a real-world workload, then INTEL optimized their compiler to maximize performance in that benchmark, now SPEC is saying that by optimizing for their (real world simulation) benchmark, users of INTEL processors aren't going to see real world performance that matches the benchmark results?

    Sounds like INTEL optimized for what SPEC considered real world workloads, now SPEC is saying their benchmarks don't actually predict real world performan

    • Sounds like INTEL optimized for what SPEC considered real world workloads

      No, that's simply now how "representation" works. When something is representative of something else it doesn't mean that cheating is makes you case applicable to the other thing. Intel is not optimising for real world conditions. They are optimising specifically for something that is *not* the real world.

      • by guruevi ( 827432 )

        SPEC benchmarks are pretty close to real world. It's the entire raison d'etre of SPEC. If I want to know which CPU is best at eg. fluid dynamics, I go to SPEC and see what CPU is best for the price, the optimizations that were made, which programming languages and compilers were used to get a certain result etc. I don't go to SPEC to see an overall useless number like the PassMark scoring system.

    • Both (sub-) benchmarks are real world code used in real programs doing real tasks. So yeah, try again and get this straight.
      • The benchmark is a representative example of a real calculation workload, not an exhaustive list of all workloads.

        The compiler spitting out hand-tuned machine code when it recognizes the benchmark is somewhat, but not completely unlike the Diesel gate scandal of cheating on emissions tests.

        • Exactly my point, but let's make clear that if you change the workload (IOW the input file) of the benchmark, you can no longer compare it to any other published results, only to those made with the same workload. Again: that workload is part of the benchmark.

          But if a specific compiler optimization only gives a notable improvement for a benchmark with the default workload, but not with (most) others, there is something fishy going on.

        • by guruevi ( 827432 )

          That's not how code/benchmarks work. You compile the code, then give it the data, the compiler cannot predict when you compile a ray tracing Fortran or C program that you will then feed it a specific workload for benchmarking purposes.

          • But hat's the most simple way to cheat at benchmarks, at least for a well known benchmark that's always the same. You enable the special cheating codepath when you detect the benchmark. You wouldn't believe how many benchmarks cheaters have been caught by simply changing a filename and have the numbers drop.
            • by guruevi ( 827432 )

              Sure, but those are gaming benchmarks.

              SPEC is a totally different beast. They give reference code for classic computer science problems in languages like C, Fortran etc. and then you can build your own code in HPC on that reference code. Hence why people and vendors use SPEC to benchmark systems and not just the CPU, because the same CPU but in a different build (eg. Dell vs SuperMicro, 1U, 2U etc) can have significant differences due to things like heat management in the chassis.

              What Intel here did is opti

              • So why did Intel remove those "optimizations" from their compiler if they worked in general cases? And why did they add them in the first place?
    • by sjames ( 1099 )

      No. Imagine a test track for an autonomous vehicle. It is a standard track meant to be representative of city traffic but it plays out the same way every time. So instead of a system that truly drives the car autonomously, you simply build a clockwork that always operates the car the same way with no awareness of the surroundings.

      You'll get a 100% in the test and fail miserably in the real world.

      Similarly, Intel used a compiler rigged to do especially well on the benchmark and only the benchmark.

  • Is anyone surprised? (Score:5, Informative)

    by UnknowingFool ( 672806 ) on Saturday February 17, 2024 @05:33PM (#64248182)

    Intel used to be the king of CPUs but these last 6 years have seen them lose out in a number of areas. AMD has beat them handily since the launch of Ryzen in performance, efficiency, and sometimes cost. During that time, they lost Apple as a major customer because Apple would not wait year after year for chips that were not any better than the previous generation. ARM based CPUs are the defacto CPUs in smartphones and tablets.

    Incidentally there shades of this cheating when Intel unveiled their "Go PC" campaign. After Apple launched their ARM based M1 computers, Intel thought it would be a good idea to go after a former customer. Some people called out some of their points as misleading or dishonest. For example, benchmarking Intel CPUs vs Apple M1 CPUs by using beta software on the Mac vs released software on Intel. Or comparing performance and power efficiency between Intel and Apple processors but obscuring that they cherry picked different Intel processors for different tests vs a single M1 processor. Or claiming that the "performance" score of the Mac on some games as zero that were not available for Mac M1.

    • Intel used to be the king of CPUs but these last 6 years have seen them lose out in a number of areas.

      Intel is the "Boeing" of semiconductors.

      • by gweihir ( 88907 )

        Yep, pretty much. Gigantic egos, fundamentally deficient skills. And a lot of "useful idiot" fanbois.

    • by mjwx ( 966435 )

      Intel used to be the king of CPUs but these last 6 years have seen them lose out in a number of areas. AMD has beat them handily since the launch of Ryzen in performance, efficiency, and sometimes cost. During that time, they lost Apple as a major customer because Apple would not wait year after year for chips that were not any better than the previous generation. ARM based CPUs are the defacto CPUs in smartphones and tablets.

      Apple were a rounding error. Sorry fanboys but no-one's missed you, it's the same as when Apple left PPC for Intel, IBM had so many other customers (all 3 of the console manufacturers) that they couldn't get Apple out the door fast enough.

      You're right about AMD though as they've been making inroads into the two markets Intel dominated, laptops and servers. AMD has been eating Intel's lunch on the desktop for years, for most of the time since the Athlon64 was released but laptops eclipsed desktop sales ye

      • Apple were a rounding error. Sorry fanboys but no-one's missed you, it's the same as when Apple left PPC for Intel, IBM had so many other customers (all 3 of the console manufacturers) that they couldn't get Apple out the door fast enough.

        If Apple was a rounding error, why was Intel so butthurt that they left? Apple may not have bought as many CPUs as Dell but behind the scenes Apple was contributing to the Intel. For example, Apple worked with Intel on Thunderbolt 1 including their recommendation of using mini DSP as a connector. Intel had proposed Light Peak years earlier but working with Apple they released a specification that laptop makers could use. Apple was always pushing the edge of lightness and thinness that influenced Intel's des

  • by kenh ( 9056 ) on Saturday February 17, 2024 @05:37PM (#64248186) Homepage Journal

    ... then you have to be able to explain how these specific benchmark values influence your personal purchasing decisions.

    After investigating, SPEC found that Intel had used compilers that were, quote, "performing a compilation that specifically improves the performance of the 523.xalancbmk_r / 623.xalancbmk_s benchmarks using a priori knowledge of the SPEC code and dataset to perform a transformation that has narrow applicability."

    What the hell do the 523.xalancbmk and 623.xalancbmk benchmarks measure?

    https://www.spec.org/cpu2017/D... [spec.org]

    https://www.spec.org/cpu2017/D... [spec.org]

    Apparently they benchmark XML to HTML translations.

    • by kenh ( 9056 )

      Sorry, one link was bad - here's the correct link:

      https://www.spec.org/cpu2017/D... [spec.org]

    • So lets assume the individual compiler developer is using existing benchmarks to both run his test code and confirm that his latest optimization push did not make the code base slower. If optimization for the sprecailized case give a 9% increase, green light but upon further review the specailized case exists in 1% of the code, does one really rollback their specialized case, the developer will just not bother pushing his code to the approval process until he has 20 of those, or just before the bonus win
    • Ohh, so now you know those benchmarks are real world programs, and have switched to claiming cheating isn't a problem. Anything to defend Intel, eh?
    • by Knightman ( 142928 ) on Saturday February 17, 2024 @06:37PM (#64248284)

      The first thing you are missing here, Intel had prior knowledge of the benchmark code, their competition didn't - so they optimized the code using that prior knowledge, aka cheated.

      The second thing you are missing here is that the CPU's destined for data-centers is a +300 billion dollar market, if Intel can cheat to increase their market-share with 0.5%, for example, it translates to a revenue-increase around a billion dollars.

      That's why it's a big deal, and all this is on par for Intel since they have a long history of sleaziness when it comes to benchmarks, especially when the competition is taking market-share from them.

  • by az-saguaro ( 1231754 ) on Saturday February 17, 2024 @06:03PM (#64248226)

    What is the purpose of the benchmark tests? Do they validate raw processor performance, or do they validate performance in a software task-oriented environment?
    Whatever anyone did in this story, it was not a hardware (tweak from what I can see reading the articles and other links).

    If Intel programmers could wrangle better performance out of a testing regime by writing a better compiler to produce more algorithmically compact and efficient machine code, then doesn't that mean that there is room for improvement in how existing compilers are written? What would happen if the benchmarks were coded directly in assembler? If the benchmark tests then ran better or faster, doesn't this just mean that the existing non-Intel compilers used by the testing agency are un-optimized?

    I can see where Intel or any company might use obfuscated results and numbers to their advantage, but I don't see how this impugns the Intel processors per se.

    • What is the purpose of the benchmark tests? Do they validate raw processor performance, or do they validate performance in a software task-oriented environment?
      Whatever anyone did in this story, it was not a hardware (tweak from what I can see reading the articles and other links).

      If Intel programmers could wrangle better performance out of a testing regime by writing a better compiler to produce more algorithmically compact and efficient machine code, then doesn't that mean that there is room for improvement in how existing compilers are written? What would happen if the benchmarks were coded directly in assembler? If the benchmark tests then ran better or faster, doesn't this just mean that the existing non-Intel compilers used by the testing agency are un-optimized?

      I can see where Intel or any company might use obfuscated results and numbers to their advantage, but I don't see how this impugns the Intel processors per se.

      FTA:

      SPEC has ruled that the compiler used for this result was performing a compilation that specifically improves the performance of the 523.xalancbmk_r / 623.xalancbmk_s benchmarks using a priori knowledge of the SPEC code and dataset to perform a transformation that has narrow applicability.

      Some optimization is about finding a better route from A to B.

      Some optimization is about the fact that people are usually going from A to B, and very rarely C, so you can give them a shortcut from A to B at the cost of

  • In layman's terms, SPEC is accusing Intel of optimizing the compiler specifically for its benchmark, which means the results weren't indicative of how end users could expect to see performance in the real world. Intel's custom compiler might have been inflating the relevant results of the SPEC test by up to 9%...

    No, in layman's terms, it is called cheating, plain and simple.

    This is no different from a teacher, knowing what questions were in the coming exam, use those same questions as "examples" when teaching his/her students in class. Doing that would called cheating just like what Intel did.

    Still want to argue? Imagine it were some Chinese chip company doing this instead of Intel, would you still continue to defend this practice?

It's a naive, domestic operating system without any breeding, but I think you'll be amused by its presumption.

Working...