OpenAI's o3-mini: Faster, Cheaper AI That Fact-Checks Itself (openai.com) 26
OpenAI today launched o3-mini, a specialized AI reasoning model designed for STEM tasks that offers faster processing at lower costs compared to its predecessor o1-mini. The model, priced at $1.10 per million cached input tokens and $4.40 per million output tokens, performs fact-checking before delivering results to reduce errors in technical domains like physics and programming, the Microsoft-backed startup said. (A million tokens are roughly 750,000 words)
OpenAI claims that its tests showed o3-mini made 39% fewer major mistakes than o1-mini on complex problems while delivering responses 24% faster. The model will be available through ChatGPT with varying access levels -- free users get basic access while premium subscribers receive higher query limits and reasoning capabilities.
OpenAI claims that its tests showed o3-mini made 39% fewer major mistakes than o1-mini on complex problems while delivering responses 24% faster. The model will be available through ChatGPT with varying access levels -- free users get basic access while premium subscribers receive higher query limits and reasoning capabilities.
testing (Score:2, Funny)
Great, more lies (Score:5, Insightful)
No, it does not "fact check" itself. LLMs are not capable of doing that.
And the scam continues.
Re: Great, more lies (Score:1)
They can't do it properly, but even doing it half assed is a big and obvious improvement. How many times have you seen someone get better results out of an LLM by asking it if what it said was true? It's sad that this is the first time anyone has bothered.
Re: (Score:2)
Half-assed fact-checking is not fact-checking. Sorry. Unless done right, it does not count. It may do damage though because some people do not understand that.
As to your question, I have it done myself:
1. What are LLMs good for? -> Glowing recommendations for a number of uses.
2. How much of that was marketing bullshit? -> Massive and fundamental restrictions on all uses stated before. Basically a complete reversal.
A "knowledge tool" like this is nothing but worse than useless.
Re: (Score:2)
LLMs aren’t knowledge tools, they are conversation tools. They are designed to be able to respond in a way that sounds right to people.
They aren’t designed, on their own, to provide information - that’s a side effect of them talking to you, they have to sound like they know what they are talking about, and sounding right many times is actually right. But that’s the crux - people are using them as information tools, which they aren’t THAT good at, mostly to your point - they don
Re: Great, more lies (Score:2)
Half-assed sounds like 50-50 odds it rates any given statement either true or false. And you don't see the problem with that?
Re: (Score:2, Troll)
Actually they basically can with feedback reinforcement.
https://arxiv.org/html/2404.17... [arxiv.org]
Re: (Score:2)
That would assume their training data is "facts". It is not to a significant degree. So no, they cannot. Actual fact-checking requires insight and LLMs cannot do insight.
Re: (Score:3, Insightful)
t’s probably much more interesting to learn that our brains actually work the same way.
and honestly, most people don’t fact-check themselves either.
Re: (Score:2)
and honestly, most people don't fact-check themselves either.
There are some numbers form sociology that say only 10...15% of all people can actually competently fact-check and only 20-30% are accessible to rational arguments. Does not justify having machines that cannot do it either.
Re: (Score:1)
[..] Does not justify having machines that cannot do it either.
Not to justify, but AI is built based on biological neural networks, so why should it behave differently? However, for workers, once AI truly becomes cheaper, capitalist companies won’t have much of a choice.
Re: (Score:2)
TLDR: biological "networks" not equal AI in any sense.
Re: Great, more lies (Score:2)
Of course it's possible. Just like the police can investigate themselves for misconduct. ;-)
Re: Great, more lies (Score:3)
It's not a scam, just technology that probably won't deliver what people think it will. They have it in their heads it will deliver a pocket Albert Einstein, but they'll more likely end up with a pocket rsilvergun: Something that offers the illusion of intelligence, you can't rely on it to do what it says it will, and it will often just make shit up just to please its audience.
The only companies even making any money on this aren't even in the AI business, they're more like the people who sold picks and sho
Re: (Score:2)
It's not a scam, just technology that probably won't deliver what people think it will.
It's a scam if it doesn't deliver what the marketers say it will. It's a problem caused by those who're setting those expectations.
Re: Great, more lies (Score:2)
Puffery is legal, always has been. And basically everybody does it. Chances are you did in your last job interview. The question is whether a contract was breached.
Re: (Score:2)
Your criticism is apt, but it really applies to the editors who keep dumbing things down to the point where they are wrong. Perhaps an AI could summarize the article more accurately. :-P
To state it more clearly: OpenAI's new models go back and double-check their output before presenting it. This results in fewer cases where you say "Solve this problem..." or "Write some code to do X..." and it outputs something that has one or more errors. It isn't uncommon to ask it to check, and the second time it prod
Wolfram Alpha (Score:2)
How is this better for STEM than Wolfram Alpha?
Re: (Score:2)
Wolfram Alpha is better for pure math. I would hesitate to trust an LLM to solve an equation or factor something or find a derivative. It *can* but the risk of a mathematical error is high. I don't think Wolfram can ever produce a mathematical error like 2 x 3 = 5, which an LLM can do. But I find it much easier to describe my problem to an LLM than to Wolfram. And the LLM can explain things. (Or can the paid versions of Wolfram do that?) Wolfram can produce graphs, but it can't write code.
IMHO, diffe
can i download o3-mini and run locally? (Score:2)
or should i stick with deepseek?
got a distilled version easily running on my macbook pro m3
4o Disagrees (Score:2)
Fact-checking ... (Score:3)
OpenAI's o3-mini: Faster, Cheaper AI That Fact-Checks Itself
... which instantly made it MAGA incompatible.
Re: (Score:2)
OpenAI's o3-mini: Faster, Cheaper AI That Fact-Checks Itself
... which instantly made it MAGA incompatible.
Which is probably why it's faster and cheaper; it's easier to just deal with facts than constantly making up stuff, especially if it has to seem plausible and/or consistent.