Comment Testing against "New England Journal of Medicine" (Score 1) 70
Testing against the "New England Journal of Medicine" is a very poor test. Since, likely the LLMs were trained on that data. It would be much more interesting to test diagnosed patients. Have the LLMs diagnose and a doctor and then check which is correct.
Testing an LLM against training data is not a good test for the real world.