Comment Re:Does this mean it'll stop sucking? (Score 1) 23
I found GP2.5 to be great at academic-style research and writing; it was absolutely awful at writing code. So; I would tell it to plan some thing for me and write it in a way that could be used by another agent (Claude Code) to build the code to do the thing. In this way, it has been great! I haven't yet attempted it with 3.
That said, I found GP3.0's page to be hilarious:
It demonstrates PhD-level reasoning with top scores on Humanityâ(TM)s Last Exam (37.5% without the usage of any tools) and GPQA Diamond (91.9%). It also sets a new standard for frontier models in mathematics, achieving a new state-of-the-art of 23.4% on MathArena Apex.
It then proceeds to show, lower down on the page, an example of what it can do, by showing off 'Our Family Recipes". If there's anything that touts PhD-level reasoning and writing, it's a recipe book.