Comment Stupid headline and stupid statistics (Score 5, Insightful) 17
36.1% pass would be worrying if this was a qualification test of things it needs to be able to do. It's not. This is a benchmark, and it SHOULD have a low pass rate. That's how you know if you're making improvements.
We could quite easily create a different benchmark where it passes 99.9%. That wouldn't mean the device being tested is good. It would just mean we have a useless benchmark.
I have no opinion on whether AI is good or bad for this use case. I just hate when statistics are used to mislead people.