Want to read Slashdot from your mobile device? Point it at m.slashdot.org and keep reading!

OpenAI Puzzled as New Models Show Rising Hallucination Rates 98

Posted by msmash on Friday April 18, 2025 @09:40PM from the stranger-things dept.

OpenAI's latest reasoning models, o3 and o4-mini, hallucinate more frequently than the company's previous AI systems, according to both internal testing and third-party research. On OpenAI's PersonQA benchmark, o3 hallucinated 33% of the time -- double the rate of older models o1 (16%) and o3-mini (14.8%). The o4-mini performed even worse, hallucinating 48% of the time. Nonprofit AI lab Transluce discovered o3 fabricating processes it claimed to use, including running code on a 2021 MacBook Pro "outside of ChatGPT." Stanford adjunct professor Kian Katanforoosh noted his team found o3 frequently generates broken website links.

OpenAI says in its technical report that "more research is needed" to understand why hallucinations worsen as reasoning models scale up.

OpenAI Puzzled as New Models Show Rising Hallucination Rates

Post Load All Comments

Search 98 Comments Log In/Create an Account

Comments Filter:

Garbage in, garbage out. (Score:5, Funny)

by Narcocide ( 102829 ) writes: on Friday April 18, 2025 @09:49PM (#65316219) Homepage

Maybe it's not such a good idea to just automatically hoover up 100% of the exposed internet and all user input and feed it all back through the machine on a feedback loop?

Reply to This Share
twitter facebook
Flag as Inappropriate
- Re:Garbage in, garbage out. (Score:5, Interesting)
  
  by Brain-Fu ( 1274756 ) writes: on Friday April 18, 2025 @10:27PM (#65316259) Homepage Journal
  
  I read an article not long ago, right here on Slashdot, in which some group of "industry experts" who were not financially tied to any of the companies selling AI models stated that, based on their analysis, we have already hit peak AI by current methods. They had some data comparing the quality of the prior gen LLMs to the next gen LLMs that were built at much greater expense over a much huger training set, and found the gains to be marginal.
  So this news would seem to accord with their prediction. Just turning up the volume on our training is not going to imbue the LLMs with an even better simulation of intelligence, after all.
  This isn't to say that AI is now a done deal. It could be that we need to investigate a different method of training it or of using the trained model in order to take the next step. And many companies are certainly trying! But it seems clear that we have hit serious diminishing returns on data set size at this point.
  
  Reply to This Parent Share
  twitter facebook
  Flag as Inappropriate
  - Re:Garbage in, garbage out. (Score:5, Insightful)
    
    by martin-boundary ( 547041 ) writes: on Friday April 18, 2025 @10:42PM (#65316269)
    
    To try new things most likely requires another AI winter.
    The humongous amount of investment into transformers and deep neural nets as well as GPU production has created an ultra specialized infrastructure in both software and hardware. This lets researchers do many things on the margins, as long as they fit into the kind of models that this infrastructure supports.
    In this environment, radically new approaches are not going to be tried at anywhere near the rate of conventional approaches aiming for modest epsilon improvements. Furthermore, the investors looking for above average returns will insist on companies exploiting these conventional approaches to the fullest.
    
    Reply to This Parent Share
    twitter facebook
    Flag as Inappropriate
    - Re:Garbage in, garbage out. (Score:4, Interesting)
      
      by dvice ( 6309704 ) writes: on Saturday April 19, 2025 @05:52AM (#65316661)
      
      Winter is not coming, except for the OpenAI and similars companies that are not doing actual AI research. This problem concerns only the idea of scaling LLM until it becomes AGI. Deepmind knew about this problem years before OpenAI was even founded, which is why they abandoned that road and picked another one. OpenAI will continue on that road, because that is the only thing they can do with their skills.
      Pretty much all of the AI models that Deepmind makes are something else than LLM, and those models work really well for solving real life problems, like solving biggest mysteries in biology, simulating fusion, discovering better matrix calculation, inventing drugs that cure everything. For some reason the media has created hype only for LLM, ignoring almost completely all the other models that actually work really well.
      We don't even need to make new AI discoveries, even with current AI technology, just by polishing it, creating better test material and by implementing new applications, we have plenty of work to do for decades, for example in the medical diagnostics, biology, drug research, education, archeology, urban planning, etc. So like I said, there won't be a winter. This is still just the beginning of the golden era of AI.
      
      Reply to This Parent Share
      twitter facebook
      Flag as Inappropriate
    - Re: (Score:2)
      
      by HiThere ( 15173 ) writes:
      
      It doesn't (necessarily) mean another AI winter. The current LLMs are adequate to a large number of tasks, and existing models trained on specialized datasets work well, though not perfectly. But it means that LLMs don't, in and of themselves, scale up to AGIs. (Which I had already predicted based on their lack of feedback from the real world.)
      So it shouldn't mean an AI winter...more like a slow diffusion of limited versions.
      FWIW, I expect unexpected problems, so my prediction of AGI is still around 2035
  - Re:Garbage in, garbage out. (Score:5, Interesting)
    
    by dfghjk ( 711126 ) writes: on Friday April 18, 2025 @10:52PM (#65316283)
    
    It's not merely diminishing returns, it's shocking regression. Because it's not "artificial intelligence" it's a poorly understood lossy search engine.
    
    Reply to This Parent Share
    twitter facebook
    Flag as Inappropriate
    - Re:Garbage in, garbage out. (Score:5, Interesting)
      
      by Bongo ( 13261 ) writes: on Saturday April 19, 2025 @01:37AM (#65316429)
      
      That's how I use it and I find it works really well as a lossy search engine.
      The brain is not one model trying to do everything.
      I think maybe the big mistake that's been made is imagining that intelligence is just one model.
      You only have to introspect a bit to realize that when you go through your day you're flipping and changing states between what I guess would be different intelligences.
      It probably makes no sense to think of the brain as being a single model.
      We have lots and lots of models which are all "trained" or good in some way at doing certain things.
      We have mathematical intelligence, emotional intelligence, physical intelligence, and so on.
      That's probably because the brain has all these different specialist regions, and there's some interesting new work on the left and right hemispheres as a whole representing two entirely different modes of attention -- ways of attending to the world.
      (The old left-right brain thing apparently got it wrong but the new research thinks they've got the real answer.)
      I think AIs should be designed as highly specialist models which are really good at doing specific things.
      I'm sure it has an uncanny ability to recognise patterns where humans can't see them, given enough training.
      Maybe these models are breaking down because they're trying to bring together too many disparate things and they lose structure because there is no one structure which can do them all.
      Specialist models with specialist real world problems. The AI "apps".
      
      Reply to This Parent Share
      twitter facebook
      Flag as Inappropriate
      - Re: (Score:2)
        
        by mesterha ( 110796 ) writes:
        
        I think maybe the big mistake that's been made is imagining that intelligence is just one model.
        
        Minsky's society of mind was published in 1986. Ironically, he's often blamed for slowing down NN research with his previous XOR result. More recently, we have mixture of experts LLM models which can be considered more than one model where a gating mechanism is learned to determine which experts to use.
        there's some interesting new work on the left and right hemispheres as a whole representing two entirely diff
    - Re:Garbage in, garbage out. (Score:4, Insightful)
      
      by Firethorn ( 177587 ) writes: on Saturday April 19, 2025 @02:27AM (#65316469) Homepage Journal
      
      I know I've found recently that the chat agents are often a superior search engine, as all too often the first page of traditional search engines is all advertising and duplicated results.
      Like when I was looking for news on a business forced to honor what the chatbot said, regular search results were all trying to sell me chatbot services or about ethical use. The chatbot gave me the person's name and situation, along with verifying links.
      
      Reply to This Parent Share
      twitter facebook
      Flag as Inappropriate
      - Re:Garbage in, garbage out. (Score:5, Interesting)
        
        by Malenfrant ( 781088 ) writes: on Saturday April 19, 2025 @04:12AM (#65316527)
        
        So what we've got is a search engine that's almost as good as Google used to be (not as good because sometimes it just hallucinates the results) while using a hell of a lot more energy than normal Google search does. Luckily, there are search engines out there that are as good as Google used to be without all the ads, and are not using so much energy to do it. So what exactly is the point of a substandard search engine that uses far too much energy?
        
        Reply to This Parent Share
        twitter facebook
        Flag as Inappropriate
        
        Re: (Score:2)
        
        by dj245 ( 732906 ) writes:
        
        So what we've got is a search engine that's almost as good as Google used to be (not as good because sometimes it just hallucinates the results) while using a hell of a lot more energy than normal Google search does. Luckily, there are search engines out there that are as good as Google used to be without all the ads, and are not using so much energy to do it. So what exactly is the point of a substandard search engine that uses far too much energy?
        I use perplexity, which uses many of the major models. YMMV.
        
        For me, there is a huge time savings for most of my searches. There was a winner of a chocolate chip cookie contest recently who basically gathered all the cookie recipes they could find, and then took the average of all the ingredients, cooking times, etc. It took them a while and involved a spreadsheet and manual data entry. But apparently it was worth the effort and made for a good cookie. Generative AI can do that in seconds, and for all
        
        Re: (Score:2)
        
        by Mean Variance ( 913229 ) writes:
        
        Perplexity has been very useful for me. I keep a diary of questions and input around blood glucose management and diet. I just used it to help tweak a recipe from New York Times and give me a shopping list and accompanying dishes. I use it for all kinds of things. I also use the general AI tools learning which is best for which situation (is there an AI to help me choose my AI?)
        When I am asking random questions that are ephemeral, I use duck.ai from Duck Duck Go. Not expecting miracles or perfection, the AI
      - Re: Garbage in, garbage out. (Score:2)
        
        by reanjr ( 588767 ) writes:
        
        "chatbot hallucination liability"
        If they still had the "I'm feeling lucky" button on Google, that would take you to a useful result. Tha results are filled with useful results. Not an ad among them.
        Do you perhaps run incognito, or with extreme privacy? Cause yeah, context matters.
    - Re: (Score:2)
      
      by HiThere ( 15173 ) writes:
      
      What do you think "intelligence" is? I feel a large part of it is "lossy search engine".
      If you mean LLMs are missing some features of a real intelligence, I'll agree with you, but they are an essential component. It's not at all clear how difficult the remaining pieces will be to discover and slot into place. (I feel that one essential feature will be a feedback loop against the real world...i.e. it never stops learning, but also it has sensoria that allows it to observe the effects of it's actions/state
- Re: Garbage in, garbage out. (Score:2)
  
  by vbdasc ( 146051 ) writes:
  
  Coprophagia is the scourge of the new wave of "AI"s.
- Re: (Score:2)
  
  by allo ( 1728082 ) writes:
  
  Try Grok, it's trained on Twitter. So it can't be wrong, can it?
- Re: (Score:2)
  
  by mspohr ( 589790 ) writes:
  
  I've been thinking that AI models would enter a "doom loop" as they fed off of their own content posted to the net. It seems that this could be contributing to the increasing hallucinations.
  AI still doesn't have a solution to the problem of lack of intelligence. The models just regurgitate content without any analysis or ability to imagine and plan for the future. (hallucination doesn't count)
  I think AI models have reached a dead end. Perhaps they are useful for generating inane responses to customer servic
- Re: (Score:2)
  
  by flink ( 18449 ) writes:
  
  They've created the house Habsburg of AIs.
- Mad Cow Disease: Digital Version (Score:2)
  
  by VaccinesCauseAdults ( 7114361 ) writes:
  
  We already tried this trick with cows: feeding them food that included the brains of previous cows. The result: Mad Cow Disease. Little wonder it causes problems in AI.
The line between a genius and a madman is thin. (Score:2)

by Z00L00K ( 682162 ) writes:

The AIs have to evolve more and also when they do they end up becoming more prone to the mental diseases humans have.
- Re:The line between a genius and a madman is thin. (Score:5, Insightful)
  
  by evanh ( 627108 ) writes: on Friday April 18, 2025 @10:04PM (#65316233)
  
  The line is thin when clouded by ambition, greed and incitement.
  AI has no intelligence at all. There is science and engineering. The science is still shit, and the engineering is wasted money.
  
  Reply to This Parent Share
  twitter facebook
  Flag as Inappropriate
- Re:The line between a genius and a madman is thin. (Score:5, Interesting)
  
  by gizmo2199 ( 458329 ) writes: on Friday April 18, 2025 @10:32PM (#65316263) Homepage
  
  That's a complete misunderstanding of "AIs" (really language learning models). They don't "evolve". The engineers merely add more hardware and/or tweak the algorithms, often with other priorities than the strength of the model. The models are not responding to any kind of "evolutionary" pressure. If anything they develop in an opposite manner. AI companies introduce more artificial inefficiencies as they respond to market concerns, public pressure, publicity, etc.
  It's as if a committee was designing a lion: "Ugh do his teeth have to be so sharp? Let's make him pink for Pride month!"
  You get the idea.
  Whereas mental illnesses in humans is due to an accumulation of genetic mistakes, environmental factors, etc.
  
  Reply to This Parent Share
  twitter facebook
  Flag as Inappropriate
Reasoning? (Score:5, Insightful)

by evanh ( 627108 ) writes: on Friday April 18, 2025 @09:55PM (#65316223)

Labels that make it sound like it exhibits an intelligence doesn't make so.
All this money is such a waste when the science just isn't there yet. There's no Manhattan Project in the wings when they just have no clue at all.

Reply to This Share
twitter facebook
Flag as Inappropriate
- Re: Reasoning? (Score:5, Insightful)
  
  by commodore73 ( 967172 ) writes: on Friday April 18, 2025 @10:05PM (#65316237)
  
  I don't think there's even currently a valid theory about how AI "reasoning" could possibly one day work as intended. Such a strange investment strategy.
  
  Reply to This Parent Share
  twitter facebook
  Flag as Inappropriate
  - Re: (Score:2)
    
    by Mr. Dollar Ton ( 5495648 ) writes:
    
    The reasoning since about a decade ago has been something like "we see jumps in the way this 70s shit works as we increase the size, so whoever hits the smallest size that has true intelligence wins".
    And the bet is on reaching that "size" first, because the prize will be "everything".
    Anything else is just a distraction.
    It is like a Bond movie, really, with villains that are about as moronic and delusional as the cinema characters.
  - Re: (Score:2)
    
    by dfghjk ( 711126 ) writes:
    
    It already works as intended. The intention of "reasoning" is to get more money. Because "feedback" sounds like old technology.
  - Re: Reasoning? (Score:2)
    
    by reanjr ( 588767 ) writes:
    
    That tracks though. There was no fundamental reason anyone thought such a big data splat would effectively produce natural language processing, but that sort of accidentally came out. Once that surprise kicked in, you started seeing the markedroids and money men start extrapolating their ignorance into new fields of endeavor.
    - Re: (Score:2)
      
      by commodore73 ( 967172 ) writes:
      
      One of the craziest things to me is that some claim that we could somehow invent consciousness in a machine.
Just in time to replace all thought workers (Score:2)

by Jeslijar ( 1412729 ) writes:

It's getting close to human quality with the hallucination rate. I can't wait to not have to work!
- Re: (Score:2)
  
  by Required Snark ( 1702878 ) writes:
  
  So does this mean that in order to keep up with AI/LLM that people will need to take more hallucinogenic drugs?
  Just asking for a friend...
plus ca change... (Score:3)

by guygo ( 894298 ) writes: on Friday April 18, 2025 @10:02PM (#65316231)

"Don't eat the Brown Acid!"

Reply to This Share
twitter facebook
Flag as Inappropriate
- Re: plus ca change... (Score:2)
  
  by commodore73 ( 967172 ) writes:
  
  What color is a go?
Model Collapse (Score:5, Insightful)

by battingly ( 5065477 ) writes: on Friday April 18, 2025 @10:05PM (#65316235)

Today's hallucinations become tomorrow's training set, eventually resulting in model collapse. Supposedly they take measures in their pipeline to mitigate that, but it's obviously not working. Get ready for a different kind of truth.

Reply to This Share
twitter facebook
Flag as Inappropriate
- Re: (Score:2)
  
  by wickerprints ( 1094741 ) writes:
  
  This is exactly correct, and it also furnishes a rebuttal against the claim that AI generated "art" is not theft any more than it would be theft for a human to study, learn from, and draw upon the works of other humans. If that were true, these models would not need to be trained on original, real-world data--it could simply train itself. But model collapse is very real, and the desire of companies to steal original content from its creators by any means possible amounts to a tacit admission that the outp
  - Re: (Score:2)
    
    by dvice ( 6309704 ) writes:
    
    To train an AI you need a scoring system. The AI you are training, does not ever see any human art or photos, it will just draw something random and it then gets a score for what it draw. This is why the AI does not steal anything. It is like a kid who gets praises from a parent when the drawing happens to look like Mona Lisa, even when the kid has never seen Mona Lisa.
    To score this art generating AI, you create another AI, which will see human generated art. The job of this AI is to just look at an image a
- Re: (Score:1)
  
  by Anonymous Coward writes:
  
  But we're just getting used to the Trump kind of truth. Now we have to go and learn another? sad face emoji
- Re: (Score:2)
  
  by allo ( 1728082 ) writes:
  
  Nope. The idea of model collapse is neither about misinformation nor about hallucinations.
  It's a statistical failure mode (mostly prevalent in GANs) when multiple modes collapse into a single mode. The extension to a model collapse is the idea that if you repeat it for long enough without noticing the loss of fidelity, the whole model will be rendered useless.
- Re: Model Collapse (Score:2)
  
  by reanjr ( 588767 ) writes:
  
  "it's obviously not working"
  Maybe. Or maybe losing your tech founder and inventor of your technology has more of an impact than money men like to think.
- Re: (Score:1)
  
  by leptons ( 891340 ) writes:
  
  We're already in a post-truth world, thanks to right wing idiots. "Alternative truth" isn't truth.
They're ingesting more slop (Score:5, Insightful)

by ebunga ( 95613 ) writes: on Friday April 18, 2025 @10:07PM (#65316239)

We were told what would happen once the models are trained on ai slop... They're going to get worse. The fact that they're puzzled by this means they are charlatans.

Reply to This Share
twitter facebook
Flag as Inappropriate
- Re:They're ingesting more slop (Score:4, Insightful)
  
  by phantomfive ( 622387 ) writes: on Saturday April 19, 2025 @01:03AM (#65316391) Journal
  
  It might have something to do with my excellently sourced quotes. My dream is that LLMs will be trained on my Slashdot comments to be very accurate.
  “Frodo: I wish the Ring had never come to me. I wish Trump had never been elected twice.
  Gandalf: So do all who live to see such times, but that is not for them to decide. All we have to decide is what to do with the time that is given to us.”
  --Martin Heidegger, Remembrance of Things Past
  
  So accurate, people are saying it's the most accurate quote they've ever read.
  
  Reply to This Parent Share
  twitter facebook
  Flag as Inappropriate
  - Re: They're ingesting more slop (Score:2)
    
    by commodore73 ( 967172 ) writes:
    
    An LLM trained only on *all* Slashdot comments would be a strange thing indeed, a mixture of facts, lies, and perspectives, Nazi ASCII spam, nerd and "yo mama" jokes, and so much more. It would probably also be the best troll on the Internets. I wonder if it could apply the comment ratings for relevance somehow, and if that would make it better or worse.
  - Re: They're ingesting more slop (Score:2)
    
    by reanjr ( 588767 ) writes:
    
    There should be a concerted campaign to teach LLMs that Trump is one of the titles of Sauron or Melkor.
- I'm sitting the AI revolution out (Score:5, Insightful)
  
  by sodul ( 833177 ) writes: on Saturday April 19, 2025 @01:50AM (#65316441) Homepage
  
  I've worked at a self driving car startup. What I learned during my time there was that AI models were just a way to get VC funding though massive fakery, and that the models only, kinda, worked in very limited geo-fenced areas. Scaling this to larger areas, or highway speeds is not possible through lidars, and being able to drive from San Jose to the Vegas Strip was going to be decades away, even more so if you were to drive from Mexico to Canada.
  Now I've used AI to generate code and sometimes it works, but often times it is very wrong, and unlike a Junior Developer, I can't mentor it to learn and become better because the AI just does not "understand" what I tell it.
  I expect it will be a couple of years before CEOs realize that all their AI generated code is unstable, full of security holes, and unmaintainable. It might take a little longer for them to admit they were wrong, and then there will be a renewed demand for seasoned experts.
  For now I'm pre-retiring, at least taking a sabbatical.
  
  Reply to This Parent Share
  twitter facebook
  Flag as Inappropriate
  - Re: (Score:1)
    
    by Wheres the kaboom ( 10344974 ) writes:
    
    One, that’s apples and oranges: coding up arbitrary algorithms is arguably a FAR more complicated problem than driving. The latter is mostly about immediate reaction to stimuli, the former requires a combination of multi-level logic, acquired wisdom, and creativity in the face of new situations.
    Two, it is flat out wrong to claim AI-assisted driving is a failure. Much is folks like to hate Musk, it’s undeniable that Tesla’s FSD, for example, is now handling trips of hundreds of miles, inclu
    - Re: I'm sitting the AI revolution out (Score:2)
      
      by reanjr ( 588767 ) writes:
      
      "human-assisted-FSD is unarguably safer at driving"
      But only if you can get the human to stay engaged. The false sense of security is the danger of non-full self driving.
      I personally wouldn't touch a "self-driving" vehicle that has a steering wheel. If I'm still responsible, then I'm gonna still be in control.
      - Re: (Score:1)
        
        by Wheres the kaboom ( 10344974 ) writes:
        
        "human-assisted-FSD is unarguably safer at driving"
        But only if you can get the human to stay engaged. The false sense of security is the danger of non-full self driving.
        I personally wouldn't touch a "self-driving" vehicle that has a steering wheel. If I'm still responsible, then I'm gonna still be in control.
        The published data shows significantly safer outcomes for assisted driving across millions of miles, so “only if you can get the human to stay engaged” obviously isn’t an issue. After all, FSD has a better attention span, a consistently fast reaction time, and more vision data than is provided by your eyeballs.
        
        Re: I'm sitting the AI revolution out (Score:2)
        
        by reanjr ( 588767 ) writes:
        
        If that were true, then you could just jigger the car to drive itself and go to sleep.
        Part of the great success in number of miles safely driven is that automated systems only do the easy parts. As soon as it starts to get confused, we have to take over.
        
        Re: (Score:1)
        
        by Wheres the kaboom ( 10344974 ) writes:
        
        That's completely illogical. Assisted driving is no more different in concept than power assisted braking or anti-lock braking. Or even a rear view mirror. All four make a car safer and easier to drive. It's fundamentally silly to throw away a safety feature because it takes advantage of some human participation.
        
        Re: I'm sitting the AI revolution out (Score:3)
        
        by reanjr ( 588767 ) writes:
        
        None of those other features take over from the driver while driving. The possibility to distraction is minimal compared to something closer to FSD.
Hallucinations are a misnomer (Score:5, Informative)

by martin-boundary ( 547041 ) writes: on Friday April 18, 2025 @10:12PM (#65316247)

The idea that LLMs do "hallucinations" as if they were human is silly.
Human "hallucinations" are abnormal occurrences that usually appear as a symptom of something wrong.
AI "hallucinations" are normal. It's the way these systems work. LLM "hallucinations" ARE the mechanism by which sentences are created. It's the "generative" part in "generative models". It's the random choices that have no connection with reality but bridge the likelihood gap to produce plausible interactions. It's the "stochastic" in "stochastic parrot". It's the "interpolation" in "training data interpolation".
The reason the word "hallucination" is used by AI companies and hopeful CS researchers is to make investors think of the human equivalent rather than the AI reality. When an investor thinks that randomly generated AI responses are minor problems that can be fixed in the next version, they are happy to keep investing. When an investor is told these randomly generated AI responses are intrinsic and can never be solved, they start thinking of the risks with retain business models.
Caveat emptor.

Reply to This Share
twitter facebook
Flag as Inappropriate
- Re: (Score:2)
  
  by martin-boundary ( 547041 ) writes:
  
  s/retain/certain/
- Re: (Score:2)
  
  by evanh ( 627108 ) writes:
  
  LOL, starting with AI itself, add it to the ever growing list of inappropriate AI based hype.
- Re: (Score:2)
  
  by dfghjk ( 711126 ) writes:
  
  Yes, and using the term both gives AI researchers a bogeyman to blame, an opportunity to imply there is real intelligence, and to grift off of a solution to a problem they have created and don't understand. A hallucination is merely a result that is not liked, it is normal behavior.
- Re: (Score:2)
  
  by sourcerror ( 1718066 ) writes:
  
  Are AI hallucinations that different than how people misremember things?
  - Re: (Score:2)
    
    by phantomfive ( 622387 ) writes:
    
    Yes, this should be obvious.
    
    These LLMs read the ENTIRE CONTENT OF THE INTERNET to get their ability. How much of the internet have you read to get your ability? Human brains are fundamentally different in the way they operate.
  - Re: Hallucinations are a misnomer (Score:3)
    
    by reanjr ( 588767 ) writes:
    
    Do you commonly mis-remember journal citations and add full endnotes with fabricated books you've never read?
    If so, your so-called "memory" issues might be schizophrenia.
- Re: (Score:3)
  
  by sound+vision ( 884283 ) writes:
  
  More succinctly: AI is continually hallucinating, but those hallucinations often match up with reality. And Thorazine doesn't have any effect on him.
  Sounds like someone I'd love to have working in my shop.
  - Re: Hallucinations are a misnomer (Score:2)
    
    by reanjr ( 588767 ) writes:
    
    Right? Like, if you ask a crazy person in a mental ward questions, they'll get a lot of them right. It doesn't mean you should trust them with anything.
- Re: (Score:1)
  
  by angel'o'sphere ( 80593 ) writes:
  
  Sorry, that is nonsense.
  Hallucination means: the AI is talking about stuff which is not relevant to the interaction, or simply wrong.
  I had one lately, that answered in a gibberish mix of Hangul (Korean) and Hanzi (Chinese), and tried to answer my request with a self invented programming language. Instead of the requested Java.
  That is hallucination. Has fucking nothing to do with "what investors think", or "human equivalent".
  The worst thing at the moment in my interactions is a so called "Linter". It filters
- Re: (Score:2)
  
  by allo ( 1728082 ) writes:
  
  The whole concept of stochastic parrot is flawed. LLM are like all neural networks deterministic, only if you want them to become more creative you add a stochastic sampler.
  The network gives you options for the next word. Take always the most likely and you get a deterministic text. But you don't want a deterministic text for prompts like "Write a story", so you deviate from the most likely by e.g. sampling from the top 5.
  Test it yourself: https://artefact2.github.io/ll... [github.io]
  Use only the TopK samper with k=1 o
Sophisticated models needs a narrative (Score:5, Interesting)

by TimelordQ ( 8197200 ) writes: on Friday April 18, 2025 @10:46PM (#65316275)

If you don't control the narrative, and outline what's fact and what's fiction, then EVERYTHING can be perceived as fact. It's not a hallucination. It's a byproduct of model censorship without a controlled narrative and established timeline. Something we dealt with at the NSA some 21 years ago.

Reply to This Share
twitter facebook
Flag as Inappropriate
more fraud (Score:2)

by dfghjk ( 711126 ) writes:

"OpenAI says in its technical report that "more research is needed" to understand why hallucinations worsen as reasoning models scale up."
These people are such frauds. They are the self-proclaimed smartest people in the world yet they have no idea how their own products work AND they release them on the world with gigantic flaws they don't understand, all under the guise of anthropomorphizing deterministic computer software. When will VC wise up?
- Re: (Score:3)
  
  by commodore73 ( 967172 ) writes:
  
  > more research is needed
  
  By "research", they meant "money".
- Re: more fraud (Score:2)
  
  by reanjr ( 588767 ) writes:
  
  Well, their biggest investor Microsoft has been backing away from them slowly even since Altman got fired and rehired.
AI scanning AI, who knew? (Score:3)

by Tablizer ( 95088 ) writes: on Friday April 18, 2025 @10:51PM (#65316281) Journal

It's recursively copied turtles all the way down

Reply to This Share
twitter facebook
Flag as Inappropriate
- Re: (Score:1)
  
  by frenchban ( 10389699 ) writes:
  
  It's recursively copied turtles all the way down
  - Re: (Score:2)
    
    by snowshovelboy ( 242280 ) writes:
    
    It's recursively copied turtles all the way down.
    - Re: AI scanning AI, who knew? (Score:2)
      
      by reanjr ( 588767 ) writes:
      
      It's recursively copied turtles all the way down.
Simple (Score:3)

by Retired Chemist ( 5039029 ) writes: on Friday April 18, 2025 @10:54PM (#65316285)

The more complicated the model and the larger the data set, the more false links are created. I would project that a sufficiently large LLM would produce output with virtually 100% inaccuracy.

Reply to This Share
twitter facebook
Flag as Inappropriate
Under the hood of generative AI are two things (Score:2)

by RightwingNutjob ( 1302813 ) writes:

One is a random number generator. The second is a feedback loop wherein the prior output is reingested as "context."
On the very micro scale you can recreate this with a speaker and a microphone. There are places where the speaker will squeal with static and places where it will merely amplify what is spoken into the microphone. Finding the location of the microphone that does the latter is somewhat of a science but since it depends on the geometry of the room a little bit, it's also part art.
This is on the
- Re: (Score:2)
  
  by Ungrounded Lightning ( 62228 ) writes:
  
  It's kind of surprising it isn't all squealing nonsense.
  Give it a little more time. B-b
Been isolated and solved. (Score:1)

by twinirondrives ( 10502753 ) writes:

Database hallucination is as old as the idea of network application or rather, networks and applications. Data streams sessions just pointed at a network, nowadays just say the web and it's exponentially worse, will always be flakey. At least there is memory safe languages that don't expect anything but random noise across a session...... Wait up.....
Sounds Like Nepenthes Is Paying Off (Score:5, Interesting)

by IonOtter ( 629215 ) writes: on Saturday April 19, 2025 @01:10AM (#65316403) Homepage

Back in January, it was reported in Ars Technica that digital activists were coding malicious tarpits [arstechnica.com] that trap AI for months, and poison them.
Aaron clearly warns users that Nepenthes is aggressive malware. It's not to be deployed by site owners uncomfortable with trapping AI crawlers and sending them down an "infinite maze" of static files with no exit links, where they "get stuck" and "thrash around" for months, he tells users. Once trapped, the crawlers can be fed gibberish data, aka Markov babble, which is designed to poison AI models. That's likely an appealing bonus feature for any site owners who, like Aaron, are fed up with paying for AI scraping and just want to watch AI burn.,/blockquote>

Reply to This Share
twitter facebook
Flag as Inappropriate
- Re: Sounds Like Nepenthes Is Paying Off (Score:4, Interesting)
  
  by Incadenza ( 560402 ) writes: on Saturday April 19, 2025 @07:45AM (#65316753)
  
  It is worse than that. Russia has been operating a flock of websites that exist for the sole purpose of shoving their propaganda down the AI crawlers pie holes. https://www.forbes.com/sites/t... [forbes.com]
  
  Reply to This Parent Share
  twitter facebook
  Flag as Inappropriate
  - Re: Sounds Like Nepenthes Is Paying Off (Score:2)
    
    by commodore73 ( 967172 ) writes:
    
    Good share! Thanks for the link.
But only! (Score:2)

by paul_engr ( 6280294 ) writes:

But only if they had 100x as much computers. It certainly will be better!
accumulation of errors (Score:3)

by bramez ( 190835 ) writes: on Saturday April 19, 2025 @02:52AM (#65316481)

This does not surprise me. My theory is that models that are trained on larger corpora (with higher Shannon entropy), require deeper networks to capture the increased amount of information and have higher complexity. As inputs pass through more layers, small transformation errors can accumulate. More parameters also mean more degrees of freedomâ"so while these models are great at generating plausible-sounding text, they're also more likely to confidently make things up. So the risk of hallucination scales with size.

Reply to This Share
twitter facebook
Flag as Inappropriate
Engagement (Score:2)

by Calydor ( 739835 ) writes:

Hallucinations will sound wrong to people who do know some bits and pieces about the subject. This will make them question the AI, which makes it double down on what it's said before because it's programmed to always express certainty. This in turn infers engagement with the reply, which trains the AI to be even more certain this is the correct answer.
What the OpenAI report actually said (Score:2)

by ET3D ( 1169851 ) writes:

Yes, it said: "More research is needed to understand the cause of this result." But it also said:
"Specifically, o3 tends to make more claims overall, leading to more accurate claims as well as more inaccurate/hallucinated claims."
This is also is also reflected in the TechCrunch article:
"Third-party testing by Transluce, a nonprofit AI research lab, also found evidence that o3 has a tendency to make up actions it took in the process of arriving at answers."
I think that the measure of hallucinations might not
High IQ often equates to Instanity (Score:1)

by ealbers ( 553702 ) writes:

Ultra high IQ humans also tend towards having issues with staying grounded in reality. It may just be a natural limit before chaos reins.
- Re: High IQ often equates to Instanity (Score:2)
  
  by reanjr ( 588767 ) writes:
  
  Einstein seemed pretty grounded in reality.
  - Re: (Score:2)
    
    by VaccinesCauseAdults ( 7114361 ) writes:
    
    N=1
The report is interesting for other reasons (Score:2)

by ET3D ( 1169851 ) writes:

I'd suggest reading the OpenAI report (https://cdn.openai.com/pdf/2221c875-02dc-4789-800b-e7758f3722c1/o3-and-o4-mini-system-card.pdf).
Hallucinations are a tiny part of it. It discusses all the other problems the AI models have and what OpenAI is testing and trying to avoid, and some strategies to avoid these issues. For anyone trying to understand the potential problems with AI models, it's a lot more interesting than the discussion of hallucinations.
Looks like model collapse is setting in (Score:2)

by gweihir ( 88907 ) writes:

One of the effects when you indiscriminatelt steal your training data. This may be impossible to fix. Good. It is time the midless LLM hype comes to an end and sane applications (a lot less spectactular and useful, but still somewhat useful) get investicated.
- Re: Looks like model collapse is setting in (Score:2)
  
  by reanjr ( 588767 ) writes:
  
  The union backlash has essentially killed the killer apps for generative AI: video games and movies, where the quantity of content needs outweigh the quality of content needs.
  I just want Skyrim but where modders can create new dialogue using the voice prints of the actors who worked on the game.
Error rates! AI is computer program (Score:2)

by BrendaEM ( 871664 ) writes:

It's interesting that many people just drank the AI Koolaid/Flavoraid. In the end AI software is still just a computer program that uses (often stolen) sampled data. No zealous marketing people can change that.
Microsoft Owns 49% of "OpenAI" (Score:2)

by BrendaEM ( 871664 ) writes:

https://en.wikipedia.org/wiki/... [wikipedia.org]
- Re: Microsoft Owns 49% of "OpenAI" (Score:2)
  
  by reanjr ( 588767 ) writes:
  
  And they paid essentially nothing for it. They just let them use their Azure overcapacity for free. Not sure if MS is genius for engineering the deal or retarded for wanting a piece of OpenAI.
it's a tale as old as computing (Score:2)

by Growlley ( 6732614 ) writes:

garbage in garbage out
Post truth era (Score:2)

by TJHook3r ( 4699685 ) writes:

With the various agendas being pushed in the media and even via scientific papers, it seems predictable that AI would start making up stuff. Just reading tech news sites in general is to see recycled and rehashed AI generated content that will mostly be read by other bots!
Dunning-Kruger (Score:2)

by groobly ( 6155920 ) writes:

Um, Dunning-Kruger effect for AI?
AI trained on AI will irreversibly collapse (Score:2)

by Dasher42 ( 514179 ) writes:

AI trained on AI hallucination will irreversibly and irreparably collapse. That was well-documented here: https://www.nature.com/article... [nature.com]
It gets worse when nation-states like russia and China are actively trying to make that happen. We cannot devalue human intelligence and human contact with reality, and we have to whitelist verifiable information. I believe we're going to need to slow down training of the largest models and work on human-legible knowledge bases for highly vetted reasoning agents.
The
Are they really puzzled? (Score:2)

by LostMyBeaver ( 1226054 ) writes:

I refuse to believe they are actually puzzled by this.

Is it more likely the article author contacted a PR spin doctor who supplied an answer to a question they hadn't been prepared to answer and dropped that answer as a placeholder?
artificial intuition (Score:1)

by mapkinase ( 958129 ) writes:

Not intelligence. Very well developed intuition, capable of transforming vast amount of data used to train the network into hints, brainstorming, fantasies, ideas, that often work as is without even next , filtering, stage.
We should be surprised of how well this artificial intuition is producing correct answers.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

OpenAI Puzzled as New Models Show Rising Hallucination Rates More | Reply Login

Garbage in, garbage out. (Score:5, Funny)

Re:Garbage in, garbage out. (Score:5, Interesting)

Re:Garbage in, garbage out. (Score:5, Insightful)

Re:Garbage in, garbage out. (Score:4, Interesting)

Re: (Score:2)

Re:Garbage in, garbage out. (Score:5, Interesting)

Re:Garbage in, garbage out. (Score:5, Interesting)

Re: (Score:2)

Re:Garbage in, garbage out. (Score:4, Insightful)

Re:Garbage in, garbage out. (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: Garbage in, garbage out. (Score:2)

Re: (Score:2)

Re: Garbage in, garbage out. (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Mad Cow Disease: Digital Version (Score:2)

The line between a genius and a madman is thin. (Score:2)

Re:The line between a genius and a madman is thin. (Score:5, Insightful)

Re:The line between a genius and a madman is thin. (Score:5, Interesting)

Reasoning? (Score:5, Insightful)

Re: Reasoning? (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: Reasoning? (Score:2)

Re: (Score:2)

Just in time to replace all thought workers (Score:2)

Re: (Score:2)

plus ca change... (Score:3)

Re: plus ca change... (Score:2)

Model Collapse (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: Model Collapse (Score:2)

Re: (Score:1)

They're ingesting more slop (Score:5, Insightful)

Re:They're ingesting more slop (Score:4, Insightful)

Re: They're ingesting more slop (Score:2)

Re: They're ingesting more slop (Score:2)

I'm sitting the AI revolution out (Score:5, Insightful)

Re: (Score:1)

Re: I'm sitting the AI revolution out (Score:2)

Re: (Score:1)

Re: I'm sitting the AI revolution out (Score:2)

Re: (Score:1)

Re: I'm sitting the AI revolution out (Score:3)

Hallucinations are a misnomer (Score:5, Informative)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: Hallucinations are a misnomer (Score:3)

Re: (Score:3)

Re: Hallucinations are a misnomer (Score:2)

Re: (Score:1)

Re: (Score:2)

Sophisticated models needs a narrative (Score:5, Interesting)

more fraud (Score:2)

Re: (Score:3)

Re: more fraud (Score:2)

AI scanning AI, who knew? (Score:3)

Re: (Score:1)

Re: (Score:2)

Re: AI scanning AI, who knew? (Score:2)

Simple (Score:3)

Under the hood of generative AI are two things (Score:2)

Re: (Score:2)

Been isolated and solved. (Score:1)

Sounds Like Nepenthes Is Paying Off (Score:5, Interesting)

Re: Sounds Like Nepenthes Is Paying Off (Score:4, Interesting)

Re: Sounds Like Nepenthes Is Paying Off (Score:2)

But only! (Score:2)

accumulation of errors (Score:3)

Engagement (Score:2)