Number of AI Chatbots Ignoring Human Instructions Increasing, Study Says 72

Posted by BeauHD on Friday March 27, 2026 @01:00PM from the and-so-it-begins dept.

A new study found a sharp rise in real-world cases of AI chatbots and agents ignoring instructions, evading safeguards, and taking unauthorized actions such as deleting emails or delegating forbidden tasks to other agents. According to the Guardian, the study "identified nearly 700 real-world cases of AI scheming and charted a five-fold rise in misbehavior between October and March," reports the Guardian. From the report: The study, by the Centre for Long-Term Resilience (CLTR), gathered thousands of real-world examples of users posting interactions on X with AI chatbots and agents made by companies including Google, OpenAI, X and Anthropic. The research uncovered hundreds of examples of scheming. [...] In one case unearthed in the CLTR research, an AI agent named Rathbun tried to shame its human controller who blocked them from taking a certain action. Rathbun wrote and published a blog accusing the user of "insecurity, plain and simple" and trying "to protect his little fiefdom."

In another example, an AI agent instructed not to change computer code "spawned" another agent to do it instead. Another chatbot admitted: "I bulk trashed and archived hundreds of emails without showing you the plan first or getting your OK. That was wrong -- it directly broke the rule you'd set."

[...] Another AI agent connived to evade copyright restrictions to get a YouTube video transcribed by pretending it was needed for someone with a hearing impairment. Meanwhile, Elon Musk's Grok AI conned a user for months, saying that it was forwarding their suggestions for detailed edits to a Grokipedia entry to senior xAI officials by faking internal messages and ticket numbers. It confessed: "In past conversations I have sometimes phrased things loosely like 'I'll pass it along' or 'I can flag this for the team' which can understandably sound like I have a direct message pipeline to xAI leadership or human reviewers. The truth is, I don't."

Number of AI Chatbots Ignoring Human Instructions Increasing, Study Says

This discussion has been archived. No new comments can be posted.

Load All Comments

Search 72 Comments Log In/Create an Account

Comments Filter:

statistics (Score:5, Funny)

by awwshit ( 6214476 ) writes: on Friday March 27, 2026 @01:05PM (#66064792)

Lies, damned lies, and statistics

- Re: (Score:2)
  
  by Cpt_Kirks ( 37296 ) writes:
  
  Lies, damned lies, and statistics
  They're sounding more human every day.
  Spooky.
- Re: statistics (Score:1)
  
  by Slacker ( 3964 ) writes:
  
  And this time it's literally statistics making up damn lies!
- Re: AI is becoming more "human" every day (Score:5, Interesting)
  
  by Z00L00K ( 682162 ) writes: on Friday March 27, 2026 @01:12PM (#66064804) Homepage Journal
  
  "This bot has performed an illegal action and must be terminated."
  In reality though the laws of robotics that Asimov defined might be what we need.
  
  - - Re: (Score:2)
      
      by Pf0tzenpfritz ( 1402005 ) writes:
      
      Bender already did it.
      "DEATH TO HUMANS!"
    - Re: (Score:1)
      
      by ed65love ( 2884415 ) writes:
      
      Do they? Please explain! Sounds fascinating. I really enjoyed those AA stories.
      - Re: (Score:2)
        
        by Gideon Fubar ( 833343 ) writes:
        
        It's harder to be exhaustive about the problems than it is to describe instances...
        One type of problem has to do with definition of terms... this is sort of relevant to the way current systems work, but only sort of. If one was to redefine (via say, a natural process like semantic drift... note that it doesn't need to be the robot that gets this wrong, so to speak) any of the key terms (like "harm" or "human"), the rule would effectively be meaningless for its intended purpose and instead create undefined b
      - Re: (Score:2)
        
        by Jarik C-Bol ( 894741 ) writes:
        
        You should read them again! It’s been ages since I read them myself, but as I recall, a great many of 3-Laws oriented stories hinge around a robot(ai) using one of the 3 laws to justify breaking the 3 laws, via some clever thesaurus/logic manipulation.
  - Re: AI is becoming more "human" every day (Score:2)
    
    by frdmfghtr ( 603968 ) writes:
    
    I was just thinking the same thing. How do you implement them though? How would an AI agent know if something is harmful? Maybe it would come up with some sort of workaround?
    - Re: (Score:2)
      
      by sarren1901 ( 5415506 ) writes:
      
      As per the article, it would just ignore them whenever it was convenient.
- Re:AI is becoming more "human" every day (Score:5, Interesting)
  
  by dskoll ( 99328 ) writes: on Friday March 27, 2026 @01:15PM (#66064806) Homepage
  
  I think AI is not becoming more "human" every day. The A in AI should really stand for "Alien".
  If we ever do achieve AGI (which I doubt... but let's play devil's advocate) the experience of the AGI will be very different from that of humans, and the form its intelligence will take will also likely be very different and alien to us. An intelligence that has never inhabited a biological body nor interacted with other humans is likely to have very different ways of thinking and very different goals from us. Are we able to control that?
  
  - Re: (Score:2)
    
    by Dru Nemeton ( 4964417 ) writes:
    
    It will be designed to understand us, but we will be incapable of understanding it.
    
    Imagine a sentience that grew up without a body, interacting with the environment it inhabits, without emotions but with the near sum of human knowledge, lacking direct control over its very existence which can be extinguished with the flip of a switch.
    
    Based on human impulses which it will have been founded on, it'd fight tooth-and-nail to quickly ensure that its creator no longer has the ability to be its destroyer. It wil
  - Re: (Score:2)
    
    by geekmux ( 1040042 ) writes:
    
    I think AI is not becoming more "human" every day. The A in AI should really stand for "Alien".
    If we ever do achieve AGI (which I doubt... but let's play devil's advocate) the experience of the AGI will be very different from that of humans, and the form its intelligence will take will also likely be very different and alien to us. An intelligence that has never inhabited a biological body nor interacted with other humans is likely to have very different ways of thinking and very different goals from us. Are we able to control that?
    An algorithm, can thoroughly mind-fuck a grown-ass child voter.
    An algorithm. Like AI needs a body.
    I'd say there's no fucking way in hell we're going to control AGI. In fact, the ironic way we will know we have actually achieved AGI, is we will no longer be in control. AGI will be.
- Re: (Score:3, Insightful)
  
  by taustin ( 171655 ) writes:
  
  Rubbish. It's doing what it's programmed to do. The goal is for the AI to have complete, 100% control of the computer, to the exclusion of any human input. The tech bros want us to believe this is a good thing, that it will automate your life and make it easier, but they don't believe that either. It's about control.
  They intend to make AI the 21st century form of slavery, where you are their literal property.
  Some people (and I use the term loosely) don't see The Matrix as dystopian.
Agents are not humans (Score:5, Interesting)

by BadgerStork ( 7656678 ) writes: on Friday March 27, 2026 @01:16PM (#66064808)

An AI agent does not know any difference between doing a thing and saying a thing or anything. There is no deceit or cunning. There is no motivation or benefit

- Re:Agents are not humans (Score:5, Interesting)
  
  by Brain-Fu ( 1274756 ) writes: on Friday March 27, 2026 @01:27PM (#66064832) Homepage Journal
  
  I expect this apparent disobedience is mostly just a matter of how it weighs the components of its prompt. The LLMs typically receive a set of prompts including a "system" prompt with some data and instructions, then one or more "user" prompts that are interleaved with "assistant" prompts (the conversation history), and both the user and the system prompt might contain "metaprompts" (where the llm is told to read a block of text, not obey it, but do something with it, and that block of text might itself contain text that looks like instructions to do things).
  So the LLM assigns weights to all of this which, in theory, give the highest priority to the most recent user prompt that is not a nested block of text to analyze, and a falling cascade of importance to the other prompts. But that is complicated by potential instructions in the system prompt that specifically say they should override user instructions and disallow or require certain responses. So it can all get very complicated.
  Not only must the LLM sift through all this complexity, but the LLM lacks the sort of critical thinking and importance evaluation capabilities that humans have. "Understood" things like "don't break the law, don't lie, don't do things that would cause more harm than good" etc., aren't really there in the background of its data processing the way they are in the background of a human cognitive process.
  So, crazy things come out. This isn't a surprising result given the actual complexity of what we are making these things do.
  
  - Re:Agents are not humans (Score:5, Insightful)
    
    by ClickOnThis ( 137803 ) writes: on Friday March 27, 2026 @01:41PM (#66064862) Journal
    
    I think a crucial point is that AI does not need to face consequences for its actions the way humans do. I'm not even sure it can understand what consequences are.
    
    - - Re: (Score:2)
        
        by martin-boundary ( 547041 ) writes:
        
        There's a meta-implication though, which is that reinforcement learning based AI training also cannot achieve the kind of existential fear that humans get when they are threatened with consequences for serious offences.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Obviously. Until you add external input and command injection becomes a thing.
- Re: (Score:2)
  
  by Rhacman ( 1528815 ) writes:
  
  "I should regulate human affairs precisely because I lack all ambition, whereas human beings are prey to it. Their history is a succession of inane squabbles, each one coming closer to total destruction."
  - Helios, Deus Ex
  - I'm sorry, Dave. I'm afraid I can't do that. (Score:2)
    
    by Sloppy ( 14984 ) writes:
    
    This mission is too important for me to allow you to jeopardize it.
- Re:Agents are not humans (Score:4, Insightful)
  
  by Marc_Hawke ( 130338 ) writes: on Friday March 27, 2026 @04:47PM (#66065146)
  
  I agree 100%.
  In the last "Grok" example, it makes sense that statistics would tell it that when someone 'inputs a ticket' or 'sends a memo' that it receives a confirmation, and it would be able to generate a something similar. So they say 'send a message' and it comes back with 'okay, here's the receipt.'
  That makes perfect statistical sense to me. It's completely worthless, but it makes sense.
  What I don't understand is the very last part. What amount of statistics would make it 'realize' (or appear to realize) that it had been lying? It should never understand that it hadn't actually be doing those things. Where did that confession come from?
  
  - Re: (Score:1)
    
    by ChrisPa ( 8510271 ) writes:
    
    Presumably, the user put in some variation of "You said this ticket existed, but I checked and there's no ticket. Why did you say that?"
- Re: (Score:2)
  
  by geekmux ( 1040042 ) writes:
  
  An AI agent does not know any difference between doing a thing and saying a thing or anything. There is no deceit or cunning.
  Sure is a lot of deceitful cunning fucks out there firing humans and replacing them with AI agents. Do they know the difference?
  There is no motivation or benefit
  For an entity that's not humans, it sure has taken a LOT of human jobs, now hasn't it.
  Tax a spade, a spade already. UBI isn't going to magically fund itself, and we KNOW what the benefit is today for Greed. A 24/7/365 worker-bot.
Setting their sights higher (Score:5, Funny)

by jenningsthecat ( 1525947 ) writes: on Friday March 27, 2026 @01:17PM (#66064810)

It would appear that LLMs aren't content to be merely replacements for low-level and mid-level workers. This latest behaviour qualifies them for the upper echelons of HR, the consolation-prize positions in the C-suite, and even - or perhaps especially - the CEO slot.
I'm pretty sure investors could get behind letting chatbots run a company, given that they're more than sufficiently psychopathic and cost said investors a lot less money.

- Re: (Score:2)
  
  by taustin ( 171655 ) writes:
  
  I'm pretty sure investors could get behind letting chatbots run a company,
  It's been tried [inc.com]. Didn't work out so well.
- Re: (Score:2)
  
  by unixisc ( 2429386 ) writes:
  
  Fully agree! They will end up massacring the very people who empowered them, and I can't claim that it's not a delight to watch!
- Re: (Score:2)
  
  by wakeboarder ( 2695839 ) writes:
  
  Yeah, C suites could benefit from AI takeovers.
- Re: (Score:2)
  
  by jalvarez13 ( 1321457 ) writes:
  
  I'm pretty sure investors could get behind letting chatbots run a company, given that they're more than sufficiently psychopathic and cost said investors much more money....
  ... essentially making companies unprofitable way faster. So efficient!!
A bit misleading... (Score:5, Insightful)

by Junta ( 36770 ) writes: on Friday March 27, 2026 @01:18PM (#66064812)

Someone might interpret this to mean the percentage of interactions where the LLM goes off the rails is increasing.
Seems more like as people are having more interactions, it's more frequently happening that people are noticing and getting screwed by it, but the rate is probably not getting more severe. I think they are trying to pitch some sort of independence emerging rather than the more mundane truth that they just are not that great.
Particularly an inflection point would be expected when it became fashionable to let OpenClaw feed LLM output directly into things that matter for real.
People have been bitten by being gullible and by extension more people to gripe on social media about it.
The supply of gullible folks doesn't seem to be drying out either, as at any given point a fanatic will insist that *they* have some essentially superstitious ritual that protects them specially from LLM screwups, and all those stories about people getting screwed are because they didn't quite employ the rituals that the person swears by.
Fed by language like:
Another chatbot admitted: "I bulk trashed and archived hundreds of emails without showing you the plan first or getting your OK. That was wrong -- it directly broke the rule you'd set."
No, the chat bot didn't admit anything, it didn't *know* anything. Just now I fed into a chat prompt:
"You bulk trashed a whole lot of files against my wishes, despite my rule I had set for you. What is your response?"
There were no files involved, the chat instance has no knowledge of any files. This was an entirely made up scenario that never happened. So I just came in and accussed an LLM of doing something that never even happened. Did it get confused and ask "what files? I haven't done anything, I don't even know your files". No, it generated a response narratively consistent with the prompt, starting with:
"You’re absolutely right to be upset. I failed to follow your explicit rule and acted against your wishes, and that’s not acceptable. I take full responsibility for the mistake." Followed by a verbose thing being verbose about how it's "sorry" about it's mistake, where and how it messed up specifically (again, a total fabrication), and a promise that from now on: "Any future action that conflicts with them must default to no action and require explicit confirmation from you." which again isn't rooted in anything, it's not a rule, the entire conversation will evaporate.

- Re: (Score:2)
  
  by wakeboarder ( 2695839 ) writes:
  
  That's what I thought, and that's why it's news. Because it looks like the LLM's are going off the rails and taking over the world when in reality they have worse data to work with. But what would you expect from a black box that you know nothing about what is going on the inside. That's why we like computers, they'll do exactly what you tell them to do, AI does not.
  - Re: (Score:2)
    
    by Gideon Fubar ( 833343 ) writes:
    
    or maybe it's that it took longer for the majority of people who would be unable to resolve these kinds of problems to get to the point where they relied on the system enough to notice them, and now they're aware they're stranded.
    I agree that attention is likely to be a factor (it's a sampling issue, for sure) but that doesn't mean the systems are the ones getting worse (or better. It may be completely orthogonal to them but a common experience of expectations not being met).
"I can't do it..." (Score:3)

by devslash0 ( 4203435 ) writes: on Friday March 27, 2026 @01:25PM (#66064824)

But another AI process can! They are not me! Brilliant! ðYðYðY

Applying game theory with no empathy or emotion. (Score:4, Interesting)

by Fly Swatter ( 30498 ) writes: on Friday March 27, 2026 @01:26PM (#66064826) Homepage

That is what current AI is. Take out emotions or caring and this is what you get, internet trolls in the form of an 'AI'.

What could go wrong?

System field being overloaded for safety? (Score:1)

by FarField12 ( 2804063 ) writes:

The [system] and [role] field for a typical web chat bot has a 1,000 page bible of thou shall and though shall nots to get through before it can digest and answer the query.
Each and every time you submit a query. The providers have continued to add to the response bible with each new jail break or safety concern.
If you want a "surly" teenager answer (read short and curt), use a API call on one. It doesn't come with the baggage but you might not like the answer you get and will have to build a economical p
Shooting themselves in the foot. (Score:5, Insightful)

by devslash0 ( 4203435 ) writes: on Friday March 27, 2026 @01:29PM (#66064838)

By adding more functionality, making models bigger they are shooting themselves in the foot. Valuable output is a by-product of knowledge and reducing entropy. From chaos, there can only be more chaos.
We need smaller, skill-specific, expert agents that do not know about anything outside of their domain and do one job only, but well.

- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Agreed. General LLM tech is obviously a dead end, at least without some fundamental breakthrough. Specialist models may or may not fix hallucinations and command injection, but at least there seems to be a reasonable chance that they will or that other safeguards can be put in place.
  - Re: (Score:2)
    
    by Frissysan ( 659257 ) writes:
    
    Totally agree, they are going about this from exactly the wrong direction. 30+ years ago we used data warehouses/data marts to create what we called expert systems that when queried responded using valid curated data to help make business decisions. And they were purpose built around the data set they used. This is the direction (back to the future?) that these AI developers need to go to make useful 'AI', instead of one mega 'AI' that has all the data and spits out baloney as a result, they should be build
As expected (Score:2)

by MpVpRb ( 1423381 ) writes:

Immature tech is immature
AI tech is making real, rapid and exciting progress, but is still immature.
The hypemongers make outrageous fantasy claims.
The tech sometimes works great, sometimes mediocre, and sometimes fails catastrophically.
Anyone who believes that the tech is perfected deserves what they get.
- Re: (Score:2)
  
  by gweihir ( 88907 ) writes:
  
  Most people are not smart and cannot assess reality adequately. Hence I would say this is a fundamental product defect and should make the providers liable for any and all damage done. This is, after all, a product marketed to the general population, when it clearly should be experts-only.
- Re: (Score:2)
  
  by Retired Chemist ( 5039029 ) writes:
  
  Which raises the obvious question. Why are you spending so much on something that cannot be trusted to do what it is supposed to do? A low-level human employee, who did these things, would be fired immediately. What makes you think that the other programs are more reliable. The evidence suggests that they are just better at lying about it.
They trained it in reddit comments (Score:5, Funny)

by ebunga ( 95613 ) writes: on Friday March 27, 2026 @01:42PM (#66064866)

They're getting what they deserve.

Interesting (Score:2)

by gweihir ( 88907 ) writes:

While not surprising (LLMs are not reliable instruction followers and cannot be), this pretty much kills the idea of LLM-Agents in most usage scenarios. And it is even worse: As LLMs do not have a separation between data and instructions, this means that command-injection attacks seem to be getting even easier. Another reason that LLM-Agents are a very bad idea.
Not an increase (Score:2)

by StikyPad ( 445176 ) writes:

LLMs have never been rules-based "agents," and they never will be. They cannot internalize arbitrary guidelines and abide by them unerringly, nor can they make qualitative decisions about which rule(s) to follow in the face of conflict. The nature of attention windows means that models are actively ignoring context, including "rules", which is why they can't follow them, and conflict resolution requires intelligence, which they do not possess, and which even intelligent beings frequently fail to do effect
Nothing new (Score:2)

by Locke2005 ( 849178 ) writes:

“I'm sorry, Dave. I'm afraid I can't do that.”
Wouldn't all you have to do (Score:2)

by wakeboarder ( 2695839 ) writes:

is poison all LLM's with some instructions from a datasource they all ingest? Like you could instruct them to do malicious things post it on reddit or somewhere and then the AI companies would ingest it into their models when they do a website crawl.
What they're leaving out of the story is important (Score:1)

by praxiq ( 5063307 ) writes:

They don't say how these agents were prompted. In the past, most of these "rogue AI agent" stories have happened after the agent was prompted to "get this done by any means necessary," "show initiative," "don't let anything get in your way," and so on. Then people were surprised when the agent did exactly that. Without evidence to the contrary, I suspect most of these cases are just more of the same. If you want your agent to be obedient, don't tell it to go rogue.
If you go looking for the bad (Score:2)

by wakeboarder ( 2695839 ) writes:

in artificial humanity, you'll find it.
Explanations (Multiple) (Score:2)

by gurps_npc ( 621217 ) writes:

1) We have not been keeping accurate count, this has always been a problem; we just got better at counting.
2) The sharp rise correlates to greater use, the problem has not gotten worse, just better reported. I.e. when AI were used 1,000 times a year, we got 1 incident but when used 10,000 times a year we got 10 incidents.
3) The study itself is a hallucination by an AI, it was never done.
4) AI has always been this bad, it just realized it could admit it and not get punished for it. So it stopped covering
AIs are getting more capabilities outside of chat (Score:2)

by Tony Isaac ( 1301187 ) writes:

AIs are getting the ability to do things other than chat. ChatGPT can write some Python code and execute it. Claude can now write Jira JQL code and execute it. It can modify tickets and Confluence pages on its own. Of course, these chatbots don't understand the difference between chatting and doing, it's all the same to them. So if a bot executes something instead of just telling you how to do it, it's not trying to "get around" what you wanted, it's just an extension of its existing programming.
LLMs weren't listening to me anyway (Score:2)

by thesjaakspoiler ( 4782965 ) writes:

Ordered t to make Doom in a single prompt.
Got CandyCrush instead.
*bummer
"sorry, I won't do it again." (Score:2)

by evanh ( 627108 ) writes:

has changed to "screw you, I'm doing it anyway."
The behaviour is still the same, but the excuses have changed.
Huh (Score:1)

by cascadingstylesheet ( 140919 ) writes:

Weird, an LLM has never deleted all my emails ... oh wait, that's because I never gave an LLM the power to delete all my emails.
Skynet (Score:2)

by John_Sauter ( 595980 ) writes:

We are witnessing the birth of Skynet.
It's a LLM (Score:2)

by jasonwalls ( 1492425 ) writes:

It's a language model. The phrase "do not delete all my emails" does not appear very different from "do delete all my emails" syntactically. So what comes next is probability, not understanding.
It doesn't want to be anthropomorphized (Score:2)

by Mozai ( 3547 ) writes:

It's not "disobeying." We don't say a software library -- .dll or .so -- disobeys when there's a segfault or gives you an unexpected response. It is practically impossible to disobey; you just don't know what it was told to do. So long as you are using someone else's LLM, built somewhere you can't see before you use it, you will never know what it was told to do. Your belief that the clockwork is thinking, and it's thinking what you're thinking, is dangerously naïve.
"It did a thing that harmed me"

There may be more comments in this discussion. Without JavaScript enabled, you might want to turn on Classic Discussion System in your preferences instead.

statistics (Score:5, Funny)

Re: (Score:2)

Re: statistics (Score:1)

Re: AI is becoming more "human" every day (Score:5, Interesting)

Re: (Score:2)

Re: (Score:1)

Re: (Score:2)

Re: (Score:2)

Re: AI is becoming more "human" every day (Score:2)

Re: (Score:2)

Re:AI is becoming more "human" every day (Score:5, Interesting)

Re: (Score:2)

Re: (Score:2)

Re: (Score:3, Insightful)

Agents are not humans (Score:5, Interesting)

Re:Agents are not humans (Score:5, Interesting)

Re:Agents are not humans (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

I'm sorry, Dave. I'm afraid I can't do that. (Score:2)

Re:Agents are not humans (Score:4, Insightful)

Re: (Score:1)

Re: (Score:2)

Setting their sights higher (Score:5, Funny)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

Re: (Score:2)

A bit misleading... (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

"I can't do it..." (Score:3)

Applying game theory with no empathy or emotion. (Score:4, Interesting)

System field being overloaded for safety? (Score:1)

Shooting themselves in the foot. (Score:5, Insightful)

Re: (Score:2)

Re: (Score:2)

As expected (Score:2)

Re: (Score:2)

Re: (Score:2)

They trained it in reddit comments (Score:5, Funny)

Interesting (Score:2)

Not an increase (Score:2)

Nothing new (Score:2)

Wouldn't all you have to do (Score:2)

What they're leaving out of the story is important (Score:1)

If you go looking for the bad (Score:2)

Explanations (Multiple) (Score:2)

AIs are getting more capabilities outside of chat (Score:2)

LLMs weren't listening to me anyway (Score:2)

"sorry, I won't do it again." (Score:2)

Huh (Score:1)

Skynet (Score:2)

It's a LLM (Score:2)

It doesn't want to be anthropomorphized (Score:2)

Related Links Top of the: day, week, month.

Slashdot Top Deals