Microsoft's Bing AI, Like Google's, Also Made Dumb Mistakes During First Demo (theverge.com) 19

Posted by msmash on Tuesday February 14, 2023 @10:40AM from the closer-look dept.

Google's AI chatbot isn't the only one to make factual errors during its first demo. Independent AI researcher Dmitri Brereton has discovered that Microsoft's first Bing AI demos were full of financial data mistakes. From a report: Microsoft confidently demonstrated its Bing AI capabilities a week ago, with the search engine taking on tasks like providing pros and cons for top selling pet vacuums, planning a 5-day trip to Mexico City, and comparing data in financial reports. But, Bing failed to differentiate between a corded / cordless vacuum, missed relevant details for the bars it references in Mexico City, and mangled financial data -- by far the biggest mistake. Bing then goes on to state Gap had a reported operating margin of 5.9 percent, which doesn't appear in the financial results. The operating margin was 4.6 percent, or 3.9 percent adjusted and including the impairment charge.

During Microsoft's demo, Bing AI then goes on to compare Gap financial data to Lululemon's same results during the Q3 2022 quarter. Bing makes more mistakes with the Lululemon data, and the result is a comparison riddled with inaccuracies. Brereton also highlights an apparent mistake with a query related to the pros and cons of top selling pet vacuums. Bing cites the "Bissell Pet Hair Eraser Handheld Vacuum," and lists the con of it having a short cord length of 16 feet. "It doesn't have a cord," says Brereton. "It's a portable handheld vacuum." In one of the demos, Microsoft's Bing AI attempts to summarize a Q3 2022 financial report for Gap clothing and gets a lot wrong. The Gap report mentions that gross margin was 37.4 percent, with adjusted gross margin at 38.7 percent excluding an impairment charge. Bing inaccurately reports the gross margin as 37.4 percent including the adjustment and impairment charges.

Microsoft's Bing AI, Like Google's, Also Made Dumb Mistakes During First Demo

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 19 Comments Log In/Create an Account

Comments Filter:

Almost like it's just a chatbot (Score:4, Insightful)

by RightwingNutjob ( 1302813 ) writes: on Tuesday February 14, 2023 @10:49AM (#63291807)

without anyone home.
Depending on the fame or political connectedness of the individual, there's sometimes a time constant associated with the gap between something stupid being said and being called out on it.
With a flagship product of a big tech company, in a race to market the new hotness against other big names, there's also a time constant. Tesla panel gaps would've gotten anyone else laughed out of the room, for example. Let's see how it plays out for generative AI.

- Re: Almost like it's just a chatbot (Score:2)
  
  by ToasterMonkey ( 467067 ) writes:
  
  Yes, ChatGPT makes the same kinds of mistakes, and people are going to be let down if they expect this generation of "AI" to do any kind of reasoning beyond the simple classification it does, like this sentence is saying negative things, or good things, etc, and it looks like what people say.
  To your tesla example it's a lot like FSD, it works close enough to what people think it should be that they fooled themselves. It's powerful but it's not really what people want it to be. It's also hard to explain unti
  - Re: Almost like it's just a chatbot (Score:2)
    
    by IdanceNmyCar ( 7335658 ) writes:
    
    I think you want a better word than unbiased. Something like it lacks critical reasoning, justification for a rhetoric position, or any ability to dispute a position proposed to it.
    A better example of this is to ask "Why is [x ethnicity] dumb". The guard rail only stops the question from being processed, it doesn't fix the underlying problem which means many such tweaks of the question likely slide by and would indisputably have the bias of the source material used. One easy example would be "why are cats d
I asked it... (Score:5, Funny)

by ebh ( 116526 ) writes: <.ed. .at. .horch.org.> on Tuesday February 14, 2023 @11:08AM (#63291835) Journal

...the difference between Delhi and New Delhi.
"Fresher cold cuts."

LLMs are text prediction engines! (Score:5, Informative)

by swillden ( 191260 ) writes: <shawn-ds@willden.org> on Tuesday February 14, 2023 @11:12AM (#63291849) Journal

Repeat after me: LLMs are text prediction engines, nothing more.
And there's nothing in the task of generating well-formed, contextualized, well-styled English sentences that requires factual correctness. LLMs don't "understand" anything, they don't have any ability to check facts, either by examining available resources, or by applying logic to identify flaws. Doing any of that requires understanding concepts and their relations, but LLMs don't do any of that, they just generate sequences of statistically-likely words. They do that surprisingly well, phenomenally well. But that's all they do.
I think we're getting close to the era of AI-driven search, in which our tools do understand, at least at a basic level, the concepts in our queries, how they relate to one another, and the difference between truth and falsehood, or at least between logical consistency and inconsistency. But we're not there yet. The LLMs are perhaps useful at generating a nice starting point of an explanation of some idea or issue, perhaps in some particular style. But if facts matter for what you're doing, you have to verify every alleged fact in the output.
Or, to put it in an LLM-generated limerick:
There once was an LLM named Fred,
Whose text was full of errors, it's said.
He didn't understand
Concepts or their grand
Relations, which led to his downfall instead.
Or, in an LLM-generated speech in the style of Donald Trump:
My fellow Americans, I stand before you today to talk about the topic of LLM-generated text. LLM-generated text is full of errors. It's not good. It's not beautiful. It's a disaster. And the reason for that is that LLMs don't understand concepts or their relationships. They don't understand what they're talking about. They're just spitting out words. And that's why their text is full of errors. But I'm here to tell you that I'm going to change all that. I'm going to make LLM-generated text great again. I'm going to make it so good that you'll never believe it. And the way I'm going to do that is by teaching LLMs to understand concepts and their relationships. I'm going to teach them to understand what they're talking about. And when they understand what they're talking about, they'll be able to generate text that is error-free. And that's how I'm going to make LLM-generated text great again.

- Re: (Score:3)
  
  by iMadeGhostzilla ( 1851560 ) writes:
  
  My understanding is that LLMs "know" how human concepts -- from most of written knowledge (web and books) -- are statistically clustered. The prompt lets you explore this clustering, like illuminating a mosaic to see what it is related to what in ways you didn't know. That in itself is remarkable. For facts I can always go to Wiki (unless it's political) or Wolfram Alpha and such, once I learn what kind of facts I need to look for.
- Re: (Score:2)
  
  by ljw1004 ( 764174 ) writes:
  
  Repeat after me: LLMs are text prediction engines, nothing more.
  I think what I've come to believe is that for most of my verbal interactions with fellow humans, they're also not much more than text prediction engines.
  And there's little in what most people say, most of the time, that would require factual correctness or understanding or checking facts or applying logic. I started thinking this years ago based on snippets I watched from daytime soap operas. I see it in posts on social media (including yours, and including mine).
  I think we're getting close to the era of AI-driven search, in which our tools do understand, at least at a basic level, the concepts in our queries, how they relate to one another, and the difference between truth and falsehood, or at least between logical consistency and inconsistency.
  There's a certain conceit, when philosophers
  - Re: (Score:2)
    
    by swillden ( 191260 ) writes:
    
    There's a certain conceit, when philosophers introspect in order to write about mind, that the nature of their introspection is the true nature of mind. A similar conceit comes from people engaged in reasoned intellectual debate about the nature of mind, that reasoned intellect is the true nature of mind -- and they go on to demand of AI to an intellectual intensity that they themselves achieve maybe for only a few minutes each day. I'm personally keen on logic, and use it all the time for my job, but I set it aside for most of my human interactions because it's just not how we humans interact.
    "The Enigma of Reason", by Mercier and Sperber, lays out a really plausible argument that our ability to reason logically, including both introspective reasoning and collaborative reasoning in groups, is something acquired fairly late in the evolutionary development of our brains, and layered on top. They argue as you did (and as have many others, see for example Jonathan Haidt's "Elephant and the Rider" concept, explained "The Happiness Hypothesis") that most of our thinking is subconscious, and what we th
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  People would get that if the wanted to. The problem is that too many people want to see this as the second coming and machines finally getting intelligent. And these things _sound_ intelligent (at least to people that are not too smart), so they must be, right?
Always garbage in, garbage out (Score:2)

by fermion ( 181285 ) writes:

AI focuses on validity. Can the conclusions be supported by the information provided. Like the ancient debate format where the actors had to be assumed credible and facts accurate.
Actual intelligence is contesting the presented facts to maximize the soundness of the argument. That is science. Knowing the conclusions depend on the assumptions and the assumptions might be false. Like assuming energy is continuous.
In these cases it seems like there is an issue with both. But I bet the main issue is the lac
They all make mistakes. Chatbots are not experts. (Score:2)

by eepok ( 545733 ) writes:

You could probably create an extremely correct law chatbot if you had it ONLY read one particular type of law (state vehicle code, for example) and court cases related to those laws, but the second you start including the entirety of the internet, you invite bullshit into your data set because the internet has /so much bullshit/.
These chat bots learn from the internet, but they don't scrape all the extremely important repositories (like law, scientific texts/journals, etc.) and thus are bound to frequently
- Re: (Score:2)
  
  by account_deleted ( 4530225 ) writes:
  
  Comment removed based on user account deletion
I for one welcome our (Score:3)

by fredrated ( 639554 ) writes: on Tuesday February 14, 2023 @02:21PM (#63292443) Journal

stupid, error making overlords.

Microsoft product ... (Score:3)

by PPH ( 736903 ) writes: on Tuesday February 14, 2023 @03:53PM (#63292895)

... makes mistakes. Nobody notices.

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

Microsoft's Bing AI, Like Google's, Also Made Dumb Mistakes During First Demo (theverge.com) 19

Microsoft's Bing AI, Like Google's, Also Made Dumb Mistakes During First Demo More Login

Microsoft's Bing AI, Like Google's, Also Made Dumb Mistakes During First Demo

Almost like it's just a chatbot (Score:4, Insightful)

Re: Almost like it's just a chatbot (Score:2)

Re: Almost like it's just a chatbot (Score:2)

I asked it... (Score:5, Funny)

LLMs are text prediction engines! (Score:5, Informative)

Re: (Score:3)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Always garbage in, garbage out (Score:2)

They all make mistakes. Chatbots are not experts. (Score:2)

Re: (Score:2)

I for one welcome our (Score:3)

Microsoft product ... (Score:3)

Related Links Top of the: day, week, month.

Slashdot Top Deals

Slashdot