OpenAI Unveils o3, a Smarter AI Model With Improved Reasoning Skills (openai.com) 19
OpenAI has unveiled a new AI model that it says takes longer to solve problems but gets better results, following Google's similar announcement a day earlier. The model, called o3, replaces o1 from September and spends extra time working through questions that need step-by-step reasoning.
It scores three times higher than o1 on ARC-AGI, a test measuring how well AI handles complex math and logic problems it hasn't seen before. "This is the beginning of the next phase of AI," CEO Sam Altman said during a livestream Friday.
The Microsoft-backed startup is keeping o3 under wraps for now but plans to let outside researchers test it.
It scores three times higher than o1 on ARC-AGI, a test measuring how well AI handles complex math and logic problems it hasn't seen before. "This is the beginning of the next phase of AI," CEO Sam Altman said during a livestream Friday.
The Microsoft-backed startup is keeping o3 under wraps for now but plans to let outside researchers test it.
takes longer to solve problems (Score:2)
But matters whether you get answer in microsecond rather than millisecond as long as correct?
-- Manuel Garcia O'Kelly Davis
Sam Altman (Score:1)
Re: (Score:2)
Altman is like the scummy, slimy version of BG. And BG is pretty repulsive in what he did and who he is already.
Great, more lies (Score:2, Insightful)
Still no "reasoning skills" in LLMs, no matter how much they lie about it. And hence no "smart" either. A very fundamental breakthrough would be required, but there is nothing. Not really surprising with this old tech that that was was just scaled up, trained with a massive piracy campaign and hat its interface prettified with decidedly non-intelligent NLP.
It is a mystery for me why so many people fall for these lies. Are people just too shallow to actually see what is going on? To me, whenever I ask AI som
Re: (Score:3)
Re: (Score:2, Troll)
Exactly the other way round. Search is one of the few things they actually can do somewhat well. As to the "bottom quartile of programmers", you realize these people have massive _negative_ productivity, right? And so do LLMs.
Re: (Score:3)
Re: (Score:2)
They don't have negative productivity.
Oh yes, they do.
Re: (Score:2)
Re: (Score:2)
I don't know... lately I've been asking ChatGPT and Gemini quite a few things which would have required me to spend hours looking up. Yes, glorified search, but with summarization and near-real time search.
A very recent example, from a few days ago, when Slashdot threw a hissy fit at adblockers: I asked ChatGPT for "10 tech news from last 24 hours", and it provided a list with summarization, much like Slashdot. Then I asked to expand on item #3, I believe, and it did, then I asked it to provide me with URLs
Re: (Score:2)
I don't know... lately I've been asking ChatGPT and Gemini quite a few things which would have required me to spend hours looking up. Yes, glorified search, but with summarization and near-real time search.
Sure. But the claim here is "reasoning" and that is just a direct lie, nothing else.
That doesn't make me stupid. Of course, I could have done all that myself, but at 100x the time spent, which I would rather spend doing something more productive. Convenience is a big feature of those tools.
Sure again. Just be aware that you may miss something you would have gotten otherwise. One thing is the search skill itself. Another is the information you usually find in the context of what you are looking for. If you are not careful, you can cripple your skills, make yourself dependent and limit your view on things to a serious degree. That does not mean to always do it yourself. Just occasional to make sure you still can
Re: (Score:2)
Yes, we're totally in agreement here.
I'm old enough to remember the times where phone numbers were memorized by heart, and any average person could probably recite 7 to 10 phone numbers by heart. Nowadays, there are plenty people who haven't even memorized their own phone number.
Now, I try not to be the "get off my lawn" guy, but I think it was an useful exercise for the mind. Maybe having to memorize phone numbers was replaced by something similar (I have done that), but it's not "another thing everyone mu
Re: Great, more lies (Score:1)
WELL WELL WELL. We meet again, my favorite CS department member. Alright here is an essay for you: It suggests o3 is qualitatively different than older generation LLMs. I look forward to your take on it.
https://arcprize.org/blog/oai-... [arcprize.org]
Re: (Score:2)
I do not think you have delivered any credible evidence, and in particular no supporting explanation. Belief makes you dumb. And benchmarks? I know all about benchmarks. You, apparently, do not.
Re: (Score:1)
Did you actually read the essay?
As one person who likes thinking to another, I think you'd enjoy it. And after you read it, you would have much more specific things to insult me over. In my experience, precise, targeted, and entirely true insults are far more gratifying.
Impressive but limited (Score:3)
Re: (Score:2)
Generally speaking, this is something that pisses me off.
X unveiled AI tool A - but it's not available yet.
Y unveiled AI tool B - but it's not available yet.
Z unveiled AI tool C - but, you guessed it, it's not available yet.
I could "unveil" anything too, with a couple pretty pictures and some curated examples, but as long as the product is not available, it's worth nothing.