What Makes a Valid Benchmark? 20
An anonymous reader writes "Benchmarks can make a big difference if they are accurate in predicting performance. That's simple enough to describe; it's not nearly so simple to implement. Benchmarks can be an excellent tool for predicting performance and estimating requirements, but they also can be misleading, possibly catastrophically so. This article looks at benchmarks; the good, the bad and the ugly."
The Best Benchmarks Lead (Score:4, Informative)
As manufacturers seek to maximize benchmark scores, they end up improving their products in ways that make the product more useful to the consumers.
One example of a bad benchmark: For the longest time, cpu frequency has been a sort of benchmark easily understood by the buying public. But it was a very poor one, leading Intel to maximize cpu frequency at the cost of almost all else -- actual computational performance fell behind, power efficiency became even worse with chips becoming mini-furnaces.
doubling times are slow for Desktop Sytems (Score:1)
i tracked Content Creation Winstone 2000 scores up until the first P4s came out, and scores were taking about 30 months (two and a half years) to double.
Mathmatica 5.x doubling time? about 27 months.
Re:doubling times are slow for Desktop Sytems (Score:2, Funny)
Re:doubling times are slow for Desktop Sytems (Score:2)
A valid benchmark is... (Score:2)
Re:A valid benchmark is... (Score:1)
The best benchmarks (Score:5, Insightful)
Re:The best benchmarks (Score:2)
are real world apps that your audience will be using
Wish I had mod points, that's +5 insightful.
The short answer to this (and any process question, really) is: ask the next guy down the line. If you're a designer, ask the prepress guy what he wants. If you're prepress, ask the pressman. Rinse, repeat.
If you're a developer, ask the fscking user. Your gaussian blur might be teh shit, but if your app takes five minutes to load, nobody's going to bother with it.
Re:The best benchmarks (Score:2, Interesting)
Interesting, but metrics such as FPS and time to render a frame are equally meaningless for lots of other reasons. For example, you can "cheat" a real-world benchmark by changing the way some routines are drawn. Heck, some graphics cards makers have been known to optimize their code for a particular game or testing routine. A while back some tests would measure the time it took to run a particular Photoshop filters. This was also vulnerable to cheats beca
Re:The best benchmarks (Score:1)
Whenever I need to benchmark databases or enterprise applications I use the same method.
Often you start by taking a sample of the average useage (if the system is not live yet you need to come up with different user scenario's that will be most likely used) and mimic the load based on that. My metric would be in user sessions per minute/hour/day
Different parts of the industry but it is a good method
Surely I could also go and insert, sel
Re:The best benchmarks (Score:1)
Speaking of ugly... (Score:2)
Let's not forget the "optimizations" that both ATI and nVidia engaged in. The ones where you'd take quake3.exe and rename it quack3.exe and your framerate would suddenly drop by 20fps. You see, Quake 3 was the defacto benchmark of the era...
Interestingly, despite this special treatment, the game ran better under Wine, and still better as a native Linux program.
Re:Speaking of ugly... (Score:2)
Three things. (Score:3, Insightful)
Scope means defining clearly and specifically what your benchmark measures and what it does not measure.
Repeatability means being able to run the benchmark many times under the same conditions and getting statistically consistent results.
Transparency means having the details of the mechanics of the benchmark, so that the results can be completely analyzed and understood.
Apples and Oranges (Score:3, Interesting)
The Best Benchmark of All (Score:2, Funny)
Benchmarks are great for marketing... (Score:2)