Become a fan of Slashdot on Facebook

 



Forgot your password?
typodupeerror
×
Microsoft AI

Microsoft's Bing AI, Like Google's, Also Made Dumb Mistakes During First Demo (theverge.com) 19

Google's AI chatbot isn't the only one to make factual errors during its first demo. Independent AI researcher Dmitri Brereton has discovered that Microsoft's first Bing AI demos were full of financial data mistakes. From a report: Microsoft confidently demonstrated its Bing AI capabilities a week ago, with the search engine taking on tasks like providing pros and cons for top selling pet vacuums, planning a 5-day trip to Mexico City, and comparing data in financial reports. But, Bing failed to differentiate between a corded / cordless vacuum, missed relevant details for the bars it references in Mexico City, and mangled financial data -- by far the biggest mistake. Bing then goes on to state Gap had a reported operating margin of 5.9 percent, which doesn't appear in the financial results. The operating margin was 4.6 percent, or 3.9 percent adjusted and including the impairment charge.

During Microsoft's demo, Bing AI then goes on to compare Gap financial data to Lululemon's same results during the Q3 2022 quarter. Bing makes more mistakes with the Lululemon data, and the result is a comparison riddled with inaccuracies. Brereton also highlights an apparent mistake with a query related to the pros and cons of top selling pet vacuums. Bing cites the "Bissell Pet Hair Eraser Handheld Vacuum," and lists the con of it having a short cord length of 16 feet. "It doesn't have a cord," says Brereton. "It's a portable handheld vacuum." In one of the demos, Microsoft's Bing AI attempts to summarize a Q3 2022 financial report for Gap clothing and gets a lot wrong. The Gap report mentions that gross margin was 37.4 percent, with adjusted gross margin at 38.7 percent excluding an impairment charge. Bing inaccurately reports the gross margin as 37.4 percent including the adjustment and impairment charges.

This discussion has been archived. No new comments can be posted.

Microsoft's Bing AI, Like Google's, Also Made Dumb Mistakes During First Demo

Comments Filter:
  • by RightwingNutjob ( 1302813 ) on Tuesday February 14, 2023 @10:49AM (#63291807)

    without anyone home.

    Depending on the fame or political connectedness of the individual, there's sometimes a time constant associated with the gap between something stupid being said and being called out on it.

    With a flagship product of a big tech company, in a race to market the new hotness against other big names, there's also a time constant. Tesla panel gaps would've gotten anyone else laughed out of the room, for example. Let's see how it plays out for generative AI.

    • Yes, ChatGPT makes the same kinds of mistakes, and people are going to be let down if they expect this generation of "AI" to do any kind of reasoning beyond the simple classification it does, like this sentence is saying negative things, or good things, etc, and it looks like what people say.

      To your tesla example it's a lot like FSD, it works close enough to what people think it should be that they fooled themselves. It's powerful but it's not really what people want it to be. It's also hard to explain unti

      • I think you want a better word than unbiased. Something like it lacks critical reasoning, justification for a rhetoric position, or any ability to dispute a position proposed to it.

        A better example of this is to ask "Why is [x ethnicity] dumb". The guard rail only stops the question from being processed, it doesn't fix the underlying problem which means many such tweaks of the question likely slide by and would indisputably have the bias of the source material used. One easy example would be "why are cats d

  • by ebh ( 116526 ) <ed@NosPAm.horch.org> on Tuesday February 14, 2023 @11:08AM (#63291835) Journal

    ...the difference between Delhi and New Delhi.

    "Fresher cold cuts."

  • by swillden ( 191260 ) <shawn-ds@willden.org> on Tuesday February 14, 2023 @11:12AM (#63291849) Journal

    Repeat after me: LLMs are text prediction engines, nothing more.

    And there's nothing in the task of generating well-formed, contextualized, well-styled English sentences that requires factual correctness. LLMs don't "understand" anything, they don't have any ability to check facts, either by examining available resources, or by applying logic to identify flaws. Doing any of that requires understanding concepts and their relations, but LLMs don't do any of that, they just generate sequences of statistically-likely words. They do that surprisingly well, phenomenally well. But that's all they do.

    I think we're getting close to the era of AI-driven search, in which our tools do understand, at least at a basic level, the concepts in our queries, how they relate to one another, and the difference between truth and falsehood, or at least between logical consistency and inconsistency. But we're not there yet. The LLMs are perhaps useful at generating a nice starting point of an explanation of some idea or issue, perhaps in some particular style. But if facts matter for what you're doing, you have to verify every alleged fact in the output.

    Or, to put it in an LLM-generated limerick:

    There once was an LLM named Fred,
    Whose text was full of errors, it's said.
    He didn't understand
    Concepts or their grand
    Relations, which led to his downfall instead.

    Or, in an LLM-generated speech in the style of Donald Trump:

    My fellow Americans, I stand before you today to talk about the topic of LLM-generated text. LLM-generated text is full of errors. It's not good. It's not beautiful. It's a disaster. And the reason for that is that LLMs don't understand concepts or their relationships. They don't understand what they're talking about. They're just spitting out words. And that's why their text is full of errors. But I'm here to tell you that I'm going to change all that. I'm going to make LLM-generated text great again. I'm going to make it so good that you'll never believe it. And the way I'm going to do that is by teaching LLMs to understand concepts and their relationships. I'm going to teach them to understand what they're talking about. And when they understand what they're talking about, they'll be able to generate text that is error-free. And that's how I'm going to make LLM-generated text great again.

    • My understanding is that LLMs "know" how human concepts -- from most of written knowledge (web and books) -- are statistically clustered. The prompt lets you explore this clustering, like illuminating a mosaic to see what it is related to what in ways you didn't know. That in itself is remarkable. For facts I can always go to Wiki (unless it's political) or Wolfram Alpha and such, once I learn what kind of facts I need to look for.

    • by ljw1004 ( 764174 )

      Repeat after me: LLMs are text prediction engines, nothing more.

      I think what I've come to believe is that for most of my verbal interactions with fellow humans, they're also not much more than text prediction engines.

      And there's little in what most people say, most of the time, that would require factual correctness or understanding or checking facts or applying logic. I started thinking this years ago based on snippets I watched from daytime soap operas. I see it in posts on social media (including yours, and including mine).

      I think we're getting close to the era of AI-driven search, in which our tools do understand, at least at a basic level, the concepts in our queries, how they relate to one another, and the difference between truth and falsehood, or at least between logical consistency and inconsistency.

      There's a certain conceit, when philosophers

      • There's a certain conceit, when philosophers introspect in order to write about mind, that the nature of their introspection is the true nature of mind. A similar conceit comes from people engaged in reasoned intellectual debate about the nature of mind, that reasoned intellect is the true nature of mind -- and they go on to demand of AI to an intellectual intensity that they themselves achieve maybe for only a few minutes each day. I'm personally keen on logic, and use it all the time for my job, but I set it aside for most of my human interactions because it's just not how we humans interact.

        "The Enigma of Reason", by Mercier and Sperber, lays out a really plausible argument that our ability to reason logically, including both introspective reasoning and collaborative reasoning in groups, is something acquired fairly late in the evolutionary development of our brains, and layered on top. They argue as you did (and as have many others, see for example Jonathan Haidt's "Elephant and the Rider" concept, explained "The Happiness Hypothesis") that most of our thinking is subconscious, and what we th

    • by gweihir ( 88907 )

      People would get that if the wanted to. The problem is that too many people want to see this as the second coming and machines finally getting intelligent. And these things _sound_ intelligent (at least to people that are not too smart), so they must be, right?

  • AI focuses on validity. Can the conclusions be supported by the information provided. Like the ancient debate format where the actors had to be assumed credible and facts accurate.

    Actual intelligence is contesting the presented facts to maximize the soundness of the argument. That is science. Knowing the conclusions depend on the assumptions and the assumptions might be false. Like assuming energy is continuous.

    In these cases it seems like there is an issue with both. But I bet the main issue is the lac

  • You could probably create an extremely correct law chatbot if you had it ONLY read one particular type of law (state vehicle code, for example) and court cases related to those laws, but the second you start including the entirety of the internet, you invite bullshit into your data set because the internet has /so much bullshit/.

    These chat bots learn from the internet, but they don't scrape all the extremely important repositories (like law, scientific texts/journals, etc.) and thus are bound to frequently

  • by fredrated ( 639554 ) on Tuesday February 14, 2023 @02:21PM (#63292443) Journal

    stupid, error making overlords.

  • by PPH ( 736903 ) on Tuesday February 14, 2023 @03:53PM (#63292895)

    ... makes mistakes. Nobody notices.

Elliptic paraboloids for sale.

Working...