If the coder struggled but the result turned out great then the method will still flag the code to be likely to be bad. The method will also completely miss buggy code caused by the programmer not realizing that the problem is tricky and going for a way too simple solution.
I agree that these factors mean that the test cannot be reliably used to just identify potentially dangerous parts of code. But I think the results could reveal some interesting information about the programmer.
As you said - if we have data showing that a developer struggled with a particular area of code, but that area ends up being of high quality - then we can see that the developer likely has a great attention to detail, and is being thorough with his design and testing. That's good information to know about a developer.
Another possibility is that the developer moved quickly through the area of code without stressing. But the code quality ends up being crap. This tells us that the developer is likely sloppy, lazy or just not very good. This may help identify an opportunity to coach a newer developer - or just to identify developers who we can't trust.
Now, I'd agree that in terms of yielding actionable data - this isn't as valuable or useful as if we were able to simply get a reliable indication of code quality. But its still something interesting to consider.