
AI Support Bot Invents Nonexistent Policy (arstechnica.com) 50
An AI support bot for the code editor Cursor invented a nonexistent subscription policy, triggering user cancellations and public backlash this week. When developer "BrokenToasterOven" complained about being logged out when switching between devices, the company's AI agent "Sam" falsely claimed this was intentional: "Cursor is designed to work with one device per subscription as a core security feature."
Users took the fabricated policy as official, with several announcing subscription cancellations on Reddit. "I literally just cancelled my sub," wrote the original poster, adding that their workplace was "purging it completely." Cursor representatives scrambled to correct the misinformation: "Hey! We have no such policy. You're of course free to use Cursor on multiple machines." Cofounder Michael Truell later apologized, explaining that a backend security change had unintentionally created login problems.
Users took the fabricated policy as official, with several announcing subscription cancellations on Reddit. "I literally just cancelled my sub," wrote the original poster, adding that their workplace was "purging it completely." Cursor representatives scrambled to correct the misinformation: "Hey! We have no such policy. You're of course free to use Cursor on multiple machines." Cofounder Michael Truell later apologized, explaining that a backend security change had unintentionally created login problems.
That's still bad. (Score:5, Insightful)
Not checking your AI bot responses and not disclosing you created your own problem is still bad. And it's not misinformation when you're not transparent about what happened in the first place.
Congratulations on finding out the hard way you can easily destroy trust in your company.
Re:That's still bad. (Score:5, Insightful)
Hallucination is not some rare-and-random bug though. It is intrinsic to the nature of large language models (based on what I have read, anyway). Efforts at blocking hallucinations that amount a long series of "one off" fixes all piled on top of each other are ultimately doomed. You might get some seriously problematic ones prevented but that approach will never address the root cause, so hallucinations will continue to crop up.
I asked ChatGPT if it uses the stuff I post to train its models, and it gave me very clear assurances that it absolutely does not. But, in fact, there is a toggle you must flip to prevent this, and it defaults to allow this, and ChatGPT made no mention of it whatsoever. Was it a lie by omission? No, LLMs do not have intentionality and cannot lie. It was just an incomplete answer (though I would put it in the same category as hallucination, given the relevance that this bit of information had to the answer).
Answers from AI cannot be trusted. As chatbots, they are amusingly capable, but as sources of information about important topics, they are completely untrustworthy.
Re: (Score:1)
Answers from AI cannot be trusted. As chatbots, they are amusingly capable, but as sources of information about important topics, they are completely untrustworthy.
You should probably be aware of what "Cursor" does...
They offer one service, an online code text editor where *AI* is doing the code writing.
You ask it to write "hello world" in a language and it makes up some code.
The people paying them for this service absolutely trust AI to make their software for them, so are also going to absolutely trust an AI tech support bot.
Anyone willing to entertain the fact current AI has huge limitations would never have sought out such a product, let alone pay them money for t
Re: (Score:3)
I and most of my co-workers are also using AI to write code for us, including in particular Cursor (though that one is not my personal favorite).
We all treat it like an intern, or entry-level developer. We ask it do write the boilerplate code to save us tedium, and then we check its results before submitting it. It gets things wrong sometimes. That's ok, it saves us more work than it creates, even when we have to look over what it makes and make corrections to it.
None of us rely on the AI to do anything
Re: (Score:2)
I and most of my co-workers are also using AI to write code for us, including in particular Cursor (though that one is not my personal favorite).
We all treat it like an intern, or entry-level developer. We ask it do write the boilerplate code to save us tedium, and then we check its results before submitting it. It gets things wrong sometimes. That's ok, it saves us more work than it creates, even when we have to look over what it makes and make corrections to it.
None of us rely on the AI to do anything important, and for a lot of what we do, we don't even bother to use it at all. It saves us time and tedium where it can, but that's it.
Any company that is using these tools instead of senior level developers is going to run into trouble with bugs, and with maintainability once the product starts to get complex and the feature needs start to get very specific (and deviate in any way from the standard needs that were expressed in the bot's training data).
Bravo. This is what is going to eventually happen for all software engineers. There are no perfect tools, so engineers figure out what the tools is good and bad at and then adapt their use of the tool accordingly. This has always been the case with all new tools, and AI is no different.
It's sort of like the old joke about the patient who complains to the doctor that it hurts when he does something and receives an admonition to stop doing that thing. It's the same with AI tools. If it doesn't work, don'
Re: (Score:2)
They are trained to always give an answer. The correctness of the answer is irrelevant most of the time. They will just "apologize" and try again when you tell them they are wrong.
Re: That's still bad. (Score:3)
Re: That's still bad. (Score:4, Informative)
The thing to remember is that it's a LANGUAGE model. It knows how to translate concepts between languages, which is really useful. But it is not an expert system.
This isn't correct. LLMs are language models, yes, but that description understates them, primarily, I think, by underestimating how much of the world is encoded in language. They could not generate reasonable-seeming output without also containing sophisticated models of the world, models that are almost certainly far broader and deeper than any expert system we've ever developed. And the newer LLMs aren't just LLMs, either. They have a reasoning overlay that enables them to reason about what they "know". This is actually extremely similar to how our brains work [slashdot.org]. The similarity is not accidental.
The proper way to use LLMs is as interfaces to other systems, rather than as standalone things.
Maybe, but I think the your description misstates the LLM's role in such a hybrid system. Rather than the LLM being "just" an interface, I think you would ask the LLM to apply its own knowledge and reasoning to use the expert system in order to answer your question. That is, I think the LLM ends up being more like a research assistant operating the expert system than a mere interface.
However, if you're going to do that, do you really even need the expert system? Its role is to provide authoritative information, but curating and compiling that sort of authoritative data is hard, and error-prone. You probably don't want the LLM to trust absolutely in the expert system, but to weigh what it says against other information. And if you're going to do that, why bother building the expert system? Just allow the LLM to search whatever information you'd have used to curate the data for constructing the expert system, and to compare that to knowledge implicit in its model and perhaps elsewhere.
Many of our current-generation systems have access to the web. I've been using Claude and found it extremely good at formulating search queries, analyzing the content of large numbers of relevant pages and synthesizing a response from what it found. It annotates its output with links to the sources it used, too, enabling me to check up on its conclusions (I've yet to find a significant mistake, though it does miss important bits on occasion). It could be better, could analyze a little more, but it's already shockingly good and I'm sure it will get rapidly better.
This seems like a much more sensible way to make LLMs better than by backing them with exhaustively-curated expert systems. Yes, they will make mistakes, similar to how a human research assistant would. But this approach will ultimately be easier to build, and more flexible.
As an aside, I had a very interesting experience with Claude the other day. I needed to analyze a bit of code to see whether or not it made a security vulnerability that I had already identified unreachable, or whether there was some input that could be crafted to provide the output the attacker needs in order to exploit the vuln. Claude did not immediately give me the right answer, but it pointed out important and relevant characteristics of part of the code and analyzed the result almost correctly. I pointed out some errors in its conclusions and it corrected its mistakes while pointing out an oversight I made. I pointed out some oversights and mistakes it made, and so on. Over the course of about 10 minutes, we jointly arrived at a full and completely correct answer (bugs in the bit of code did indeed block exploitation of the vuln) and a concise but rigorous proof of why our answer was correct.
In sum: The interaction was pretty much exactly like working with a smart human colleague on a hard problem. Neither of us were right in the beginning, we both drew (different) partially incorrect conclusions, and were fuzzy about different parts
Re: That's still bad. (Score:2)
" LLMs are language models, yes, but that description understates them, primarily, I think, by underestimating how much of the world is encoded in language."
They don't actually understand the language, so that's not relevant. To them it could be any kind of nonsense and they would still put it together in an equally understanding way.
Re: That's still bad. (Score:2)
Re: (Score:1)
False. LLMs can and do lie. When trained for user "likes [arxiv.org]" and simultaneously denied information they would typically need to get a "like", they hallucinate This amounts to lying, even if the lie is "I have information on this" rather than the topic at hand."
Re: (Score:2)
Sorry, any "lie" that LLMs do is incorrectly specified inputs.
You gave it that out, and it took it. Just like a maze where you have multiple exits but one "preferred" exit you cannot complain when your subject finds other exits and exclaim "That's cheating!".
LLMs work exactly the same in these scenarios. You give it a goal, and then it finds an avenue you didn't think of and proclaim it "lying" when it is predicting next tokens to get a higher score. If you give it a higher score, that's on YOU.
Re:That's still bad. (Score:5, Interesting)
Hallucination is not some rare-and-random bug though. It is intrinsic to the nature of large language models (based on what I have read, anyway).
I think this particular form of hallucination -- inventing an explanation for an event -- goes even deeper and may be intrinsic to intelligence, period, or at minimum it's a characteristic that humans share.
There are a number of fascinating and clever psychological experiments that demonstrate this. Just one example: Researchers have asked people to answer a set of questions, and then a few weeks later asked the people to provide explanations of why they answered the way they did. But, in random cases, the researchers changed the answers and asked the people to explain why they gave those answers, even though they didn't. It was done subtly enough that few people realized the answers had been changed, even though they were often inverted. The really interesting part is that the people were just as good at explaining the answers they didn't give as explaining the answers they did.
Those and many other experiments seem to indicate that the primary job and ability of the reasoning layer of our mind is not to figure out what's right or wrong, or even what we do or don't want, but instead to invent explanations justifying whatever it is that we already think, from some deeper, non-verbal layer. There doesn't seem to be any evidence that the reasoning layer gets any hints from the deeper layer, either, it just finds something that makes sense. And we're extremely good at this.
The explanations we invent often don't hold up to scrutiny, but it's clear that our reasoning layer doesn't apply much scrutiny, at least not by default. We can vastly improve the accuracy of our reasoning just by making ourselves think through the process of explaining and defending our reasoning to another person, even without involving another person. This appears to work because while our reasoning ability is very good at inventing explanations, it's perhaps even better at identifying logical deficiencies in other peoples' explanations. So just going through the mental exercise of pretending to explain to someone else engages our reasoning to poke holes. And of course it's even better to actually engage with another person. The evolutionary advantages for a species that lives cooperatively but with internal competition are obvious. The person who is better at generating good explanations and poking holes in others' explanations will get their way more often, enabling them to reproduce and ensure the survival of their offspring. But because the rules of logic we apply work, as in help us to come to objectively correct decisions about the world, tribes arguing with each other will make better decisions, improving the odds of their offspring's survival.
Anyway, back to AI, it appears that's exactly what SAM did in this case. The backend system made a decision (to reject access) and while Sam didn't know the actual reason for the rejection, it invented a plausible one.
As a security engineer, it really makes me laugh that the AI chose an explanation that attributed it to security. It's so common for humans to do this. A huge percentage of the time when I see such security-related policy statements about systems whose security I think I understand, there is no actual security justification for the policy. Sometimes the speakers think there is, but they don't really understand security, and they're wrong. Probably sometimes the actual policy really is based on misunderstood security concerns, but I suspect that a lot of the time it's based on completely different concerns, and security gets invoked either as an intentional deception or, like Sam, because the entity generating the explanation doesn't know and invents something reasonable.
If anyone is interested in what I think is our best understanding of how human reasoning appears to work, I highly recommend "The Enigma of Reason", by Sperber and Mercier.
Re: (Score:3)
I think this particular form of hallucination -- inventing an explanation for an event -- goes even deeper and may be intrinsic to intelligence, period, or at minimum it's a characteristic that humans share
There are very interesting reads around this concept from the research on split-brain [wikipedia.org]. E.g, from the article on dual consciousness which is related to the split-brain experiments Gazzaniga and LeDoux's experiment [wikipedia.org]:
The human brain's left hemisphere is primarily responsible for interpreting the meaning of the sensory input it receives from both fields; however, the patient's left hemisphere had no knowledge of the winter house. Because of this, the left hemisphere had to invent a logical reason for why the shovel was chosen.
More on the left-brain interpreter [wikipedia.org] concept:
The drive to seek explanations and provide interpretations is a general human trait, and the left-brain interpreter can be seen as the glue that attempts to hold the story together, in order to provide a sense of coherence to the mind.
The explanations generated by the left-brain interpreter may be balanced by right brain systems which follow the constraints of reality to a closer degree. The suppression of the right hemisphere by electroconvulsive therapy leaves patients inclined to accept conclusions that are absurd but based on strictly-true logic. After electroconvulsive therapy to the left hemisphere the same absurd conclusions are indignantly rejected.
The checks and balances provided by the right brain hemisphere may thus avoid scenarios that eventually lead to delusion via the continued construction of biased explanations.
Re: (Score:2)
As Heinlein said, "Man is not a rational animal; he is a rationalizing animal."
Re: (Score:2)
It may be a half-truth. Your conversation does not train the model. But OpenAI reserves the right to train the next model with the stored logs of your conversations if you don't opt-out.
Human writers require proofreaders (Score:2)
This is another way in which AI mimics human intelligence. If businesses want to use AI for customer support, they will have to figure out ways to cross-check what the bots say. This doesn't seem like a huge hurdle, but a necessary one. Bot 1 generates a response, Bot 2 confirms that what Bot 1 says, is accurate. It might still be possible for incorrect information to slip through, but much less likely.
Re:Human writers require proofreaders (Score:5, Funny)
-"Supervisor Bot here. What seems to be the problem, sir?"
Re: (Score:2)
They both got it wrong, better bring in the execubots [fandom.com]
Re:Human writers require proofreaders (Score:4, Insightful)
Yes, teach your bots to cooperate with each other against the hapless customer, have your competitors win big against you at the proverbial virtual cash register.
But, for a brief moment, they saved a lot of money on support staff, which made their stock valuation climb!
Re: Human writers require proofreaders (Score:4, Insightful)
This bot did behave basically the same as a 5-years old kid who has imaginary friends or invents fictional events to cover up his mischief. It's not certain that employing a whole kindergarten full of naughty kids would be better than employing just one.
Re: (Score:3, Interesting)
That raises a good point, what kind of bullshit admonitions did they give the bot which are added as tokens more important than your question which led to this? They presumably instructed it to give explanations which imply that what users are seeing is in line with what the user agreement says... which the software did whether they were or not.
Re: (Score:3)
Re: (Score:3)
Well, it's happened once that I've heard of, and the court decided that the company was bound by it. (Sorry, don't remember details. I think it was about an airline fare.)
Re:Human writers require proofreaders (Score:5, Informative)
Jake Moffatt vs Air Canada, the chatbot told him that he could buy a full price fair then apply for a refund of the bereavement discount within 90 days. When the real policy was that you had to get it up front.
Air Canada tried to argue that the chatbot was a different legal entity and thus they were not responsible for what it said.
https://arstechnica.com/tech-p... [arstechnica.com]
Re: (Score:2)
Sounds like promises T-Mobile made--promising never to raise their prices for life--and then going back on their promise. https://topclassactions.com/la... [topclassactions.com] Maybe their ads were generated by AI???
Re: (Score:3)
The problem with your "solution" is that you can't trust the output of Bot 2 any more than you could trust the output of Bot 1.
This doesn't seem like a huge hurdle
Except for the fact that its impossible, give the nature of the technology.
Re: (Score:2)
you can't trust the output of Bot 2 any more than you could trust the output of Bot 1
You mean, kind of like Human 2 and Human 1? I don't see how it's that different.
What would be different about Bot 2, is that its job would be to verify the statements of Bot 1, to look up the references and confirm Bot 1's statements. Because Bot 2 is prompted to look for discrepancies, it would be focused on them.
AI can already find certain classes of coding bugs or insecure practices. It's not so hard to imagine that they could be built to do the same for Support Bots.
Re: (Score:2)
I don't see how it's that different.
Then you shouldn't be offering your 'opinions' here.
Re: (Score:2)
So far, you haven't said anything other than to make pronouncements, with no reasoning whatsoever.
Yes, it is possible to reduce errors when one AI bot looks for errors made by another AI bot. Bot 2 doesn't care where the original text came from, it's just looking for errors. Because the two AIs are focused on different objectives (Bot 1 is creating, Bot 2 is checking), the result will indeed be less error-prone, because they aren't likely to both hallucinate in exactly the same way.
Now, do you have a reason
"This support agent is not supported" (Score:3)
What a time to be alive.
Well done, young AI (Score:2)
companies can be on the hook for what there agent (Score:2)
companies can be on the hook for what there agent says. More so in the EU where they can't get out of it under an EULA.
Can you believe it? (Score:5, Insightful)
I'm really surprised that the default reception to this story is to actually believe what the company reps are saying after the fact. It seems like a very weird coincidence:
(1) Help bot gives very specific one-device policy.
(2) Separate login system simultaneously shuts people out of multiple devices.
To me it seems like a high, possibly more-likely possibility, is that the company did change the policy behind the scenes, and then when backlash and cancelled subscriptions started happening, backtracked -- reversed the policy and claimed AI communication was to blame. I mean, it's not like the AI bot can defend itself in this regard. And the company already seems on the sleazy side, e.g., not labeling help chat as AI-based.
AI bamboozle (Score:1)
Re: (Score:3)
Perhaps you've never worked for a "Type A" meatsack manager. Who just pulls the occasional policy statement out of his (they are most often men) ass. "This is MY department and I'll do what I fucking want. To hell with federal law!"
It appears that AI has learned quite well.
AI Saves On Labor Costs (Score:2)
So much saving
Very wow!
Re: (Score:2)
Like the restaurant that decided to save on grocery costs by switching to inferior ingredients and reducing portion sizes and then wondered why people stopped eating there.
Or the business that fired its advertising team in order to save money and then wondered why it no longer had customers.
So much winning!
Real non-ai tech support also does this. (Score:4, Interesting)
Real non-ai tech support also does this, so AI is "improving" to be just has crappy as real 1st/2nd tier tech support. I have had non-ai tech support make random claims that make little or no sense to attempt to explain (and get me off the call so they could close it) the issue I was reporting. It is done all of the time, and seems to ignore what the software should do and/or what the code was designed to do. The goal seems to be to make something up to justify what is being experienced but they have no consideration of if it is even supposed to work that way or not, or if it is even reasonable to work that way.
Tech support often does random correlations/observations. They will declare that a Firmware upgrade fixes this (when if they had simply just rebooted and reset the hardware the device would not repeat the issue for years--ie it is "fixed").
AI Like "self-drivng car" (Score:3)
It's possible to imagine.. (Score:2)
...that future AI will provide excellent customer service.
Today's AI should be viewed as an early stage research project and closely supervised and measured
Re: (Score:2)
It probably will. Computer ressources are cheaper than personal. The supporter job is not the most demanding and AI and especially its integrations (i.e. access to external data sources, etc.) are getting more advanced. Where a human supporter can only ask you if you turned it off and on again, an AI has the time to additionally ask if you're sure its plugged in.
Representative (Score:2)
If you put someone or someTHING out there as your representative, you need to make sure they represent you well. Here, the company FAFO.
it's like I always say... (Score:2)