Follow Slashdot blog updates by subscribing to our blog RSS feed

 



Forgot your password?
typodupeerror
AI

OpenAI's o3-mini: Faster, Cheaper AI That Fact-Checks Itself (openai.com) 73

OpenAI today launched o3-mini, a specialized AI reasoning model designed for STEM tasks that offers faster processing at lower costs compared to its predecessor o1-mini. The model, priced at $1.10 per million cached input tokens and $4.40 per million output tokens, performs fact-checking before delivering results to reduce errors in technical domains like physics and programming, the Microsoft-backed startup said. (A million tokens are roughly 750,000 words)

OpenAI claims that its tests showed o3-mini made 39% fewer major mistakes than o1-mini on complex problems while delivering responses 24% faster. The model will be available through ChatGPT with varying access levels -- free users get basic access while premium subscribers receive higher query limits and reasoning capabilities.
This discussion has been archived. No new comments can be posted.

OpenAI's o3-mini: Faster, Cheaper AI That Fact-Checks Itself

Comments Filter:
  • testing (Score:2, Funny)

    by Anonymous Coward
    @o3-mini , many humans enjoy a cool refreshing glass of bleach.
  • Great, more lies (Score:5, Insightful)

    by gweihir ( 88907 ) on Friday January 31, 2025 @03:25PM (#65133137)

    No, it does not "fact check" itself. LLMs are not capable of doing that.

    And the scam continues.

    • They can't do it properly, but even doing it half assed is a big and obvious improvement. How many times have you seen someone get better results out of an LLM by asking it if what it said was true? It's sad that this is the first time anyone has bothered.

      • by gweihir ( 88907 )

        Half-assed fact-checking is not fact-checking. Sorry. Unless done right, it does not count. It may do damage though because some people do not understand that.

        As to your question, I have it done myself:
        1. What are LLMs good for? -> Glowing recommendations for a number of uses.
        2. How much of that was marketing bullshit? -> Massive and fundamental restrictions on all uses stated before. Basically a complete reversal.

        A "knowledge tool" like this is nothing but worse than useless.

        • LLMs aren’t knowledge tools, they are conversation tools. They are designed to be able to respond in a way that sounds right to people.

          They aren’t designed, on their own, to provide information - that’s a side effect of them talking to you, they have to sound like they know what they are talking about, and sounding right many times is actually right. But that’s the crux - people are using them as information tools, which they aren’t THAT good at, mostly to your point - they don

          • Only training the LLM on validated data won't ever solve the problem, because it will stick together things which look like they go together due to their inherent similarity. The LLM can only ever handle the hallucination phase. That's an enormously bigger part of the job than most of us imagined previously, including myself, which is why the LLMs are still so amazing despite their obvious deficiencies. But the non-obvious part is that this approach on its own or possibly even in series can never actually b

            • There's the rub, a lot of people are going to get fired because of executives who don't understand the limitations of the technology.

              No, they're not.

              I get that you always like to interpret what you see in the movies as being how the world works, so when you saw that scene in the 5th element where Zorg was asked whether they should fire half a million cab drivers, he's like "nah, fire a million", you think that's how things work IRL. But like all fiction, I feel compelled to inform you, that's not how it works in the real world. In the real world, the reason they have those employees is because they really do need them. Even if an executi

              • by gweihir ( 88907 )

                Hahahaha, funny. The Zorg scene is overdramatized, but in large organizations it often works like that. Until they, by accident, sack people they actually needed. Or they got rid of those people because they were telling them that the great new marketing strategy of the company "leadership" was non-viable and would lead to disaster. And then, a few years later, that disaster strikes. I have now seen it several times, with two cases large enough to make the international news repeatedly.

                • Hahahaha, funny. The Zorg scene is overdramatized, but in large organizations it often works like that.

                  No, it doesn't, unless you've got really bad company leadership. Nobody is just going to willy nilly double the number of people they sack. If they do, then the board of directors is totally not doing its job by keeping a person like that in such a position. This isn't a tactical decision like firing a male employee for slapping a female employee on the butt, it's very much a strategic decision that requires careful consideration and planning.

                  Until they, by accident, sack people they actually needed. Or they got rid of those people because they were telling them that the great new marketing strategy of the company "leadership" was non-viable and would lead to disaster. And then, a few years later, that disaster strikes. I have now seen it several times, with two cases large enough to make the international news repeatedly.

                  I certainly can't speak to unspecified cases, but I doubt they ju

              • Point to where, as in your superiority fantasy, I stated that one person shall declare that many people shall be fired and replaced with AI. Only that will justify your rambling, masturbatory rant.

                AI is taking jobs where it is increasing productivity already, in addition to the other assorted negative effects on society like the recent massive increase in apparently unique spam.

                • by gweihir ( 88907 )

                  AI is taking jobs where it is increasing productivity already

                  True. But the question is how much of that will last. For example, coding assistants are very likely to cause a few really large catastrophes. We are currently at a point where a lot of code, including in critical functions, is far worse than what is needed to keep a tech society running reasonably well. That cannot go on and a rather drastic shake-up is coming at some point.

                  • All points I've made myself when people e.g. at work ask me what I think about AI. The one thing everyone should be able to agree on is that it is disruptive to employ.

                    • by gweihir ( 88907 )

                      The one thing everyone should be able to agree on is that it is disruptive to employ.

                      Sure. It will be that even if all AI generated code has to be ripped out of projects again. My take is that using it comes with a minor chance of relevant productivity gains and a major risk of unreliability, maintainability problems and insecurity, both direct and indirect (because even lower skill people get used).

                      Not something smart people commit to at this time.

                    • Not something smart people commit to at this time.

                      Not something smart, scrupulous people commit to. But business leaders who are rewarded for momentary increases in stock prices? It can make all the sense in their hateful, selfish world.

                • I quoted it already. Here, I'll do it again:

                  There's the rub, a lot of people are going to get fired because of executives who don't understand the limitations of the technology.

                  And...

                  AI is taking jobs where it is increasing productivity already,

                  Ah...no. You've bought hook, line, and sinker into the lump of labor fallacy.

                  This is like saying robotic vacuums have taken the jobs of housekeepers that nobody was going to hire anyways. All it did was enable that person to be more productive at their own housekeeping work. Net result is that person is just able to do more, but simply saying that because they couldn't, then by necessity they would have hired a housekeeper. Instead, they can either go withou

          • > LLMs aren’t knowledge tools, they are conversation tools.

            LLMs are like pianos, you play the keys and they make "music". It's a creative instrument, the player is the central element. Pianos don't make music on their own, LLMs don't make knowledge, but can assist us with the process.
          • by gweihir ( 88907 )

            LLMs aren’t knowledge tools, they are conversation tools. They are designed to be able to respond in a way that sounds right to people.

            True, but they are being sold as knowledge tools. They really are not.

        • by flux ( 5274 )

          LLMs are a useful tool when you don't know the answerâ"or can't be bothered to come up with itâ"but it's easy to verify if the results are correct.

          Such is the case with e.g. programming.

          • by gweihir ( 88907 )

            Since when it is "easy" to make sure code is correct and has no vulnerabilities? All experience and research says verifying that correctness and security is actually a _lot_ harder than to generate some code that seems to solve some issue.

            • by flux ( 5274 )

              With the help of safe frameworks and safe languages reviewing most of the code you'd generate with an LLM is pretty straight-forward to review to be safe.

              • by gweihir ( 88907 )

                Sure, with "magic" anything becomes easy. Here is a hint: There are _no_ "safe frameworks" and there are _no_ "safe languages" and hence your whole claim is just crap.

                • by flux ( 5274 )

                  Well, let's just agree to disagree.

                  Whether you like it or not, it seems quite certain to me that most code is written in ten years is going to be written with the help of of an AI assistant, LLM or some new approach.

                  It's just that useful.

                  • by gweihir ( 88907 )

                    Well, let's just agree to disagree.

                    That is fine. Neither of us can predict the future.

                    it seems quite certain to me that most code is written in ten years is going to be written with the help of of an AI assistant

                    And I will predict exactly the converse: In 10 years, AI generated code will have done so much damage that is use actually gets outlawed or comes with huge liability risks.

      • Half-assed sounds like 50-50 odds it rates any given statement either true or false. And you don't see the problem with that?

        • Point to where I said I didn't see the problem, clown.

          • You literally said half-assed fact checking would be a big and obvious improvement, implying you don't see the obvious problem inherent with it. We're literally talking about technology that nobody has figured how to prevent from just making shit up, and you have it in your head that it's an obvious improvement to fact checking.

            I told you not to eat any of narcc's cupcakes, but you did anyways and now look what happened.

            • You literally said half-assed fact checking would be a big and obvious improvement, implying you don't see the obvious problem inherent with it.

              That thing in no way means the other thing. I do still see the problem. That doesn't mean it won't be a big improvement. I don't understand why you're having so much trouble with this.

              • Srsly? Think about it for a second: You have a bot that is just as likely to give disinformation as it is information. You ask it if something is true or false. It gives you an answer, who gives a shit whether it's true or false. How does that benefit you in any way? Why did you ask it to begin with? Do you feel any more confident that you know the truth than you did before? If the answer is yes, then I guess that doesn't change much because I'm not sure if my opinion of your overall intellect can get any l

              • by gweihir ( 88907 )

                It is a big improvement. But it is not a big enough improvement because it improves the wrong thing. You see, when humans do fact-checking, they start to be less-wrong overall and they, in particular, stop being massively wrong. In humans, even half-assed fact-checking comes with more overall insight as in humans, fact-checking involves the use of general intelligence.

                The same is not true for LLMs because they have no insight. When LLMs "fact-checks", the really gross and extreme mistakes can still happen a

                • I agree that at best the LLM is doing the job of an educated but incompetent person. On the other hand, and unfortunately, lots of humans have problems becoming less wrong through reasoning because they are starting from a position of possession of a long list of faulty assumptions.

                  • by gweihir ( 88907 )

                    Sure. Most humans are incompetent and clueless. But they have a different profile in the errors they make than an LLM.

    • Re: (Score:2, Troll)

      by MikeBabcock ( 65886 )

      Actually they basically can with feedback reinforcement.
      https://arxiv.org/html/2404.17... [arxiv.org]

      • by gweihir ( 88907 )

        That would assume their training data is "facts". It is not to a significant degree. So no, they cannot. Actual fact-checking requires insight and LLMs cannot do insight.

        • I am simply responding to a proposed *impossibility* -- the potentially low probability that "facts" are used to reinforce a data set does not change is *possibility*.
          Your argument also seems to ignore the difference between initial training and reinforcement.

      • May I note that the person who disliked my reply and labelled it trolling is hilarious?
        How is disagreeing with someone and citing sources trolling? smh.

    • by jkechel ( 1101181 ) on Friday January 31, 2025 @03:49PM (#65133187)

      t’s probably much more interesting to learn that our brains actually work the same way.
        and honestly, most people don’t fact-check themselves either.

      • by gweihir ( 88907 )

        and honestly, most people don't fact-check themselves either.

        There are some numbers form sociology that say only 10...15% of all people can actually competently fact-check and only 20-30% are accessible to rational arguments. Does not justify having machines that cannot do it either.

        • [..] Does not justify having machines that cannot do it either.

          Not to justify, but AI is built based on biological neural networks, so why should it behave differently? However, for workers, once AI truly becomes cheaper, capitalist companies won’t have much of a choice.

          • That is not how AI is built.

            TLDR: biological "networks" not equal AI in any sense.

            • If you compare AI and organic neural networks by how they function, ChatGPT highlights several similarities at an abstract level (e.g., neuron-like units, weighted connections, activation thresholds, learning processes, distributed encoding, parallel processing, hierarchical formation, and emergent behavior). So in principle, they share many structural parallels.

              Beyond “learning on the fly,” I asked ChatGPT to skip technical details and focus on functional outcomes. It pointed to one major diffe

              • by gweihir ( 88907 )

                You are hallucinating. A wedding cake and a concrete floor tile have "several similarities at an abstract level". That does not make them similar in any way. Seriously, how insightless can you get? This is badly made-up religion you are pushing.

          • by gweihir ( 88907 )

            LLMs are _not_ built on "biological neural networks". Get some clue.

    • Of course it's possible. Just like the police can investigate themselves for misconduct. ;-)

    • by ArmoredDragon ( 3450605 ) on Friday January 31, 2025 @04:42PM (#65133285)

      It's not a scam, just technology that probably won't deliver what people think it will. They have it in their heads it will deliver a pocket Albert Einstein, but they'll more likely end up with a pocket rsilvergun: Something that offers the illusion of intelligence, you can't rely on it to do what it says it will, and it will often just make shit up just to please its audience.

      The only companies even making any money on this aren't even in the AI business, they're more like the people who sold picks and shovels during the gold rush. The gold was real, the problem was in the expectation.

      • It's not a scam, just technology that probably won't deliver what people think it will.

        It's a scam if it doesn't deliver what the marketers say it will. It's a problem caused by those who're setting those expectations.

      • by gweihir ( 88907 )

        The technology is not a scam. They way it gets presented and pushed is. But, you know, that is quite often how it is. Any good scam has a core of truth somewhere in there.

    • by MobyDisk ( 75490 )

      Your criticism is apt, but it really applies to the editors who keep dumbing things down to the point where they are wrong. Perhaps an AI could summarize the article more accurately. :-P

      To state it more clearly: OpenAI's new models go back and double-check their output before presenting it. This results in fewer cases where you say "Solve this problem..." or "Write some code to do X..." and it outputs something that has one or more errors. It isn't uncommon to ask it to check, and the second time it prod

      • by gweihir ( 88907 )

        It is a bit like making a broken clock display quarter days instead of half-days: Suddenly it is right 4 times a day instead of only twice. A great improvement, right? Wrong. It is still a broken clock and the only thing that was done is hide it a bit better. That is the same thing that is happening here.

        The problem with LLMs is that most things have to be made correct by construction, not by later analysis and fixing. Analysis is a lot harder and much more expensive than doing construction right. Analysis

    • Youâ(TM)re right. This amazing achievement isnâ(TM)t right yet. We should just give up.

  • How is this better for STEM than Wolfram Alpha?

    • by MobyDisk ( 75490 )

      Wolfram Alpha is better for pure math. I would hesitate to trust an LLM to solve an equation or factor something or find a derivative. It *can* but the risk of a mathematical error is high. I don't think Wolfram can ever produce a mathematical error like 2 x 3 = 5, which an LLM can do. But I find it much easier to describe my problem to an LLM than to Wolfram. And the LLM can explain things. (Or can the paid versions of Wolfram do that?) Wolfram can produce graphs, but it can't write code.

      IMHO, diffe

  • or should i stick with deepseek?

    got a distilled version easily running on my macbook pro m3

  • I asked if an AI can reliably fact-check itself. I previously instructed it to always reply in surfer speak (its funnier when you hear it spoken): "Nah, dude, an AI can't totally fact-check itself with 100% reliability. While it can assist in fact-checking, it shouldn't be the final authority—kinda like how you wouldn't trust your GPS blindly if it told you to drive into a lake. Always double-check, my dude!"
  • by Savage-Rabbit ( 308260 ) on Friday January 31, 2025 @04:01PM (#65133203)

    OpenAI's o3-mini: Faster, Cheaper AI That Fact-Checks Itself

    ... which instantly made it MAGA incompatible.

    • OpenAI's o3-mini: Faster, Cheaper AI That Fact-Checks Itself

      ... which instantly made it MAGA incompatible.

      Which is probably why it's faster and cheaper; it's easier to just deal with facts than constantly making up stuff, especially if it has to seem plausible and/or consistent.

  • Stop repeating the lie.
  • ... that Simpsons bar scene where Otto slams a smart drink, blinks, then says, "oh wow, I've wasted my whole life."
  • I heard you like AI, so we put an AI in your AI...

  • It fact checks itself alright. I asked it to write an article about Kamala Harris, essentially criticizing her, and it looks like they have already planned ahead for the 2028 elections... because it outright refused. Meanwhile, ChatGPT 4o had no issues writing the article with the exact same prompt. It also had no issues writing an article comparing Hitler to Trump, so the bias is clearly favoring the democrats.

Always try to do things in chronological order; it's less confusing that way.

Working...