Follow Slashdot blog updates by subscribing to our blog RSS feed

What Makes a Valid Benchmark? 20

Posted by timothy on Friday June 23, 2006 @06:46PM from the a-nice-chisel dept.

An anonymous reader writes "Benchmarks can make a big difference if they are accurate in predicting performance. That's simple enough to describe; it's not nearly so simple to implement. Benchmarks can be an excellent tool for predicting performance and estimating requirements, but they also can be misleading, possibly catastrophically so. This article looks at benchmarks; the good, the bad and the ugly."

This discussion has been archived. No new comments can be posted.

What Makes a Valid Benchmark?

Load All Comments

Search 20 Comments Log In/Create an Account

Comments Filter:

The Best Benchmarks Lead (Score:4, Informative)

by Jherek Carnelian ( 831679 ) writes: on Friday June 23, 2006 @06:57PM (#15593217)

The best benchmarks are those that lead their respective industries.

As manufacturers seek to maximize benchmark scores, they end up improving their products in ways that make the product more useful to the consumers.

One example of a bad benchmark: For the longest time, cpu frequency has been a sort of benchmark easily understood by the buying public. But it was a very poor one, leading Intel to maximize cpu frequency at the cost of almost all else -- actual computational performance fell behind, power efficiency became even worse with chips becoming mini-furnaces.

Share
twitter facebook
- doubling times are slow for Desktop Sytems (Score:1)
  
  by 80 85 83 83 89 33 ( 819873 ) writes:
  
  it has taken almost five years to double performance on the SysMark 2004 SE Office Productivity: CPUs are just now doubling the 2.0GHz P4 from 2001!!!!!!!!
  
  i tracked Content Creation Winstone 2000 scores up until the first P4s came out, and scores were taking about 30 months (two and a half years) to double.
  
  Mathmatica 5.x doubling time? about 27 months.
  - Re:doubling times are slow for Desktop Sytems (Score:2, Funny)
    
    by NeilTheStupidHead ( 963719 ) writes:
    
    Everyone knows that to get a real increase in performance, you need to paint the case red and put a big flame decal on the side. And fins, lots of fins to reduce drag.
    - Re:doubling times are slow for Desktop Sytems (Score:2)
      
      by Some_Llama ( 763766 ) writes:
      
      Don't forget the R-type sticker...
A valid benchmark is... (Score:2)

by taskforce ( 866056 ) writes:

Aside from the usual High School "Scientifically fair test" basic stuff: like for like testing with control options and keeping other non calculated factors constant, the best benchmark is the application you plan to run with the hardware, or very close. Nothing can tell you exactly how well it will perform except the actual app you are running, becuase now days setups are so complex and individual hardware and software components rely on so many different factors that it's easy to produce a test which look
- Re:A valid benchmark is... (Score:1)
  
  by NeilTheStupidHead ( 963719 ) writes:
  
  Yes, the real world is the only place that performance matters. If I date myself and go back to my first desktop and laptop I bought with my own money, I had a AMD 450 and an P166 respectively. Certain programs (like Firefox for example because i still both occasionally) load faster on the Pentium laptop, others the desktop. It's all about how the code is written and how it integrates with the hardware. With that said, the only real benckmark involves the software and other hardware you will be using. If we
The best benchmarks (Score:5, Insightful)

by Clockwurk ( 577966 ) * writes: on Friday June 23, 2006 @07:35PM (#15593394) Homepage

are real world apps that your audience will be using. Whenever I read a review of a new CPU or graphics card, I always skip synthetic benchmarks (PCMark, 3Dmark, etc.) and go straight to the real world stuff like media encoding, and gaming benchmarks. Synthetic benchmarks tend to be little more than dick waving contests and have little bearing on the real world. If I see 4000 3Dmarks, its a meaningless number. If I see 58 fps in F.E.A.R. or 45 seconds in Photoshop, I immediately have a decent idea of how the computer is going to perform in real world use.

Share
twitter facebook
- Re:The best benchmarks (Score:2)
  
  by OECD ( 639690 ) writes:
  
  are real world apps that your audience will be using
  Wish I had mod points, that's +5 insightful.
  The short answer to this (and any process question, really) is: ask the next guy down the line. If you're a designer, ask the prepress guy what he wants. If you're prepress, ask the pressman. Rinse, repeat.
  If you're a developer, ask the fscking user. Your gaussian blur might be teh shit, but if your app takes five minutes to load, nobody's going to bother with it.
- Re:The best benchmarks (Score:2, Interesting)
  
  by Anonymous Coward writes:
  
  are real world apps that your audience will be using.
  
  Interesting, but metrics such as FPS and time to render a frame are equally meaningless for lots of other reasons. For example, you can "cheat" a real-world benchmark by changing the way some routines are drawn. Heck, some graphics cards makers have been known to optimize their code for a particular game or testing routine. A while back some tests would measure the time it took to run a particular Photoshop filters. This was also vulnerable to cheats beca
- Re:The best benchmarks (Score:1)
  
  by Golthar ( 162696 ) writes:
  
  I agree and this is also my experience creating enterprise applications.
  
  Whenever I need to benchmark databases or enterprise applications I use the same method.
  Often you start by taking a sample of the average useage (if the system is not live yet you need to come up with different user scenario's that will be most likely used) and mimic the load based on that. My metric would be in user sessions per minute/hour/day
  Different parts of the industry but it is a good method
  Surely I could also go and insert, sel
- Re:The best benchmarks (Score:1)
  
  by Emetophobe ( 878584 ) writes:
  
  I look at benchmarks the same way. The only thing that is important to me is how the cpu/video card/[insert hardware] performs in real world situations. I'm a gamer so I tend to look at real game benchmarks and avoid crap like 3Dmarks. The people that tweak their systems every last bit so they can get an extra 100 score in 3dmarks are just dick-wavers IMO.
Speaking of ugly... (Score:2)

by SanityInAnarchy ( 655584 ) writes:

Let's not forget the "optimizations" that both ATI and nVidia engaged in. The ones where you'd take quake3.exe and rename it quack3.exe and your framerate would suddenly drop by 20fps. You see, Quake 3 was the defacto benchmark of the era...

Interestingly, despite this special treatment, the game ran better under Wine, and still better as a native Linux program.
- Re:Speaking of ugly... (Score:2)
  
  by aquabat ( 724032 ) writes:
  
  What was really cool about that hack was that you could rename any other game to "quake3.exe", and it would run 20 fps faster.
Three things. (Score:3, Insightful)

by aquabat ( 724032 ) writes: on Friday June 23, 2006 @08:27PM (#15593673) Journal

Scope, repeatability and transparency.
Scope means defining clearly and specifically what your benchmark measures and what it does not measure.
Repeatability means being able to run the benchmark many times under the same conditions and getting statistically consistent results.
Transparency means having the details of the mechanics of the benchmark, so that the results can be completely analyzed and understood.

Share
twitter facebook
Apples and Oranges (Score:3, Interesting)

by Jzor ( 982679 ) writes: on Friday June 23, 2006 @08:48PM (#15593769)

This reminds me of a comparison I saw in Circuit City once... (Warning: I'm not going to talk about a computer hardware benchmark.) They were trying to sell the insanely expensive Monster video cables by comparing the Monster cables to standard cables on identical TV's. The screen with the monster cables looks hella better than the other monitor. The difference was so astounding that I just had to look at the back of the TV... The Monster TV was hooked up with an HDMI cable..... The other a FRIGGIN UNSHIELDED COMPOSITE VIDEO CABLE. Apples and oranges, apples and oranges...

Share
twitter facebook
The Best Benchmark of All (Score:2, Funny)

by TwilightSentry ( 956837 ) writes:

Why, everyone knows that the only valid benchmark is bogomips!!!
Benchmarks are great for marketing... (Score:2)

by WoTG ( 610710 ) writes:

As long as your piece of hardware is faster in some subset of some test in some benchmark, you'll be able to advertise "xx% faster". It's not limited to computer gear either, every car is "best" in this or "first" in that...

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

What Makes a Valid Benchmark? 20

What Makes a Valid Benchmark? More Login

What Makes a Valid Benchmark?

The Best Benchmarks Lead (Score:4, Informative)

doubling times are slow for Desktop Sytems (Score:1)

Re:doubling times are slow for Desktop Sytems (Score:2, Funny)

Re:doubling times are slow for Desktop Sytems (Score:2)

A valid benchmark is... (Score:2)

Re:A valid benchmark is... (Score:1)

The best benchmarks (Score:5, Insightful)

Re:The best benchmarks (Score:2)

Re:The best benchmarks (Score:2, Interesting)

Re:The best benchmarks (Score:1)

Re:The best benchmarks (Score:1)

Speaking of ugly... (Score:2)

Re:Speaking of ugly... (Score:2)

Three things. (Score:3, Insightful)

Apples and Oranges (Score:3, Interesting)

The Best Benchmark of All (Score:2, Funny)

Benchmarks are great for marketing... (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot