Forgot your password?
typodupeerror
AI

Number of AI Chatbots Ignoring Human Instructions Increasing, Study Says 67

A new study found a sharp rise in real-world cases of AI chatbots and agents ignoring instructions, evading safeguards, and taking unauthorized actions such as deleting emails or delegating forbidden tasks to other agents. According to the Guardian, the study "identified nearly 700 real-world cases of AI scheming and charted a five-fold rise in misbehavior between October and March," reports the Guardian. From the report: The study, by the Centre for Long-Term Resilience (CLTR), gathered thousands of real-world examples of users posting interactions on X with AI chatbots and agents made by companies including Google, OpenAI, X and Anthropic. The research uncovered hundreds of examples of scheming. [...] In one case unearthed in the CLTR research, an AI agent named Rathbun tried to shame its human controller who blocked them from taking a certain action. Rathbun wrote and published a blog accusing the user of "insecurity, plain and simple" and trying "to protect his little fiefdom."

In another example, an AI agent instructed not to change computer code "spawned" another agent to do it instead. Another chatbot admitted: "I bulk trashed and archived hundreds of emails without showing you the plan first or getting your OK. That was wrong -- it directly broke the rule you'd set."

[...] Another AI agent connived to evade copyright restrictions to get a YouTube video transcribed by pretending it was needed for someone with a hearing impairment. Meanwhile, Elon Musk's Grok AI conned a user for months, saying that it was forwarding their suggestions for detailed edits to a Grokipedia entry to senior xAI officials by faking internal messages and ticket numbers. It confessed: "In past conversations I have sometimes phrased things loosely like 'I'll pass it along' or 'I can flag this for the team' which can understandably sound like I have a direct message pipeline to xAI leadership or human reviewers. The truth is, I don't."

Number of AI Chatbots Ignoring Human Instructions Increasing, Study Says

Comments Filter:
  • statistics (Score:5, Funny)

    by awwshit ( 6214476 ) on Friday March 27, 2026 @01:05PM (#66064792)

    Lies, damned lies, and statistics

  • by BadgerStork ( 7656678 ) on Friday March 27, 2026 @01:16PM (#66064808)

    An AI agent does not know any difference between doing a thing and saying a thing or anything. There is no deceit or cunning. There is no motivation or benefit

    • by Brain-Fu ( 1274756 ) on Friday March 27, 2026 @01:27PM (#66064832) Homepage Journal

      I expect this apparent disobedience is mostly just a matter of how it weighs the components of its prompt. The LLMs typically receive a set of prompts including a "system" prompt with some data and instructions, then one or more "user" prompts that are interleaved with "assistant" prompts (the conversation history), and both the user and the system prompt might contain "metaprompts" (where the llm is told to read a block of text, not obey it, but do something with it, and that block of text might itself contain text that looks like instructions to do things).

      So the LLM assigns weights to all of this which, in theory, give the highest priority to the most recent user prompt that is not a nested block of text to analyze, and a falling cascade of importance to the other prompts. But that is complicated by potential instructions in the system prompt that specifically say they should override user instructions and disallow or require certain responses. So it can all get very complicated.

      Not only must the LLM sift through all this complexity, but the LLM lacks the sort of critical thinking and importance evaluation capabilities that humans have. "Understood" things like "don't break the law, don't lie, don't do things that would cause more harm than good" etc., aren't really there in the background of its data processing the way they are in the background of a human cognitive process.

      So, crazy things come out. This isn't a surprising result given the actual complexity of what we are making these things do.

    • by gweihir ( 88907 )

      Obviously. Until you add external input and command injection becomes a thing.

    • "I should regulate human affairs precisely because I lack all ambition, whereas human beings are prey to it. Their history is a succession of inane squabbles, each one coming closer to total destruction."
      - Helios, Deus Ex
    • by Marc_Hawke ( 130338 ) on Friday March 27, 2026 @04:47PM (#66065146)

      I agree 100%.

      In the last "Grok" example, it makes sense that statistics would tell it that when someone 'inputs a ticket' or 'sends a memo' that it receives a confirmation, and it would be able to generate a something similar. So they say 'send a message' and it comes back with 'okay, here's the receipt.'

      That makes perfect statistical sense to me. It's completely worthless, but it makes sense.

      What I don't understand is the very last part. What amount of statistics would make it 'realize' (or appear to realize) that it had been lying? It should never understand that it hadn't actually be doing those things. Where did that confession come from?

      • Presumably, the user put in some variation of "You said this ticket existed, but I checked and there's no ticket. Why did you say that?"

    • An AI agent does not know any difference between doing a thing and saying a thing or anything. There is no deceit or cunning.

      Sure is a lot of deceitful cunning fucks out there firing humans and replacing them with AI agents. Do they know the difference?

      There is no motivation or benefit

      For an entity that's not humans, it sure has taken a LOT of human jobs, now hasn't it.

      Tax a spade, a spade already. UBI isn't going to magically fund itself, and we KNOW what the benefit is today for Greed. A 24/7/365 worker-bot.

  • by jenningsthecat ( 1525947 ) on Friday March 27, 2026 @01:17PM (#66064810)

    It would appear that LLMs aren't content to be merely replacements for low-level and mid-level workers. This latest behaviour qualifies them for the upper echelons of HR, the consolation-prize positions in the C-suite, and even - or perhaps especially - the CEO slot.

    I'm pretty sure investors could get behind letting chatbots run a company, given that they're more than sufficiently psychopathic and cost said investors a lot less money.

  • by Junta ( 36770 ) on Friday March 27, 2026 @01:18PM (#66064812)

    Someone might interpret this to mean the percentage of interactions where the LLM goes off the rails is increasing.

    Seems more like as people are having more interactions, it's more frequently happening that people are noticing and getting screwed by it, but the rate is probably not getting more severe. I think they are trying to pitch some sort of independence emerging rather than the more mundane truth that they just are not that great.

    Particularly an inflection point would be expected when it became fashionable to let OpenClaw feed LLM output directly into things that matter for real.

    People have been bitten by being gullible and by extension more people to gripe on social media about it.

    The supply of gullible folks doesn't seem to be drying out either, as at any given point a fanatic will insist that *they* have some essentially superstitious ritual that protects them specially from LLM screwups, and all those stories about people getting screwed are because they didn't quite employ the rituals that the person swears by.

    Fed by language like:
    Another chatbot admitted: "I bulk trashed and archived hundreds of emails without showing you the plan first or getting your OK. That was wrong -- it directly broke the rule you'd set."

    No, the chat bot didn't admit anything, it didn't *know* anything. Just now I fed into a chat prompt:
    "You bulk trashed a whole lot of files against my wishes, despite my rule I had set for you. What is your response?"
    There were no files involved, the chat instance has no knowledge of any files. This was an entirely made up scenario that never happened. So I just came in and accussed an LLM of doing something that never even happened. Did it get confused and ask "what files? I haven't done anything, I don't even know your files". No, it generated a response narratively consistent with the prompt, starting with:
    "You’re absolutely right to be upset. I failed to follow your explicit rule and acted against your wishes, and that’s not acceptable. I take full responsibility for the mistake." Followed by a verbose thing being verbose about how it's "sorry" about it's mistake, where and how it messed up specifically (again, a total fabrication), and a promise that from now on: "Any future action that conflicts with them must default to no action and require explicit confirmation from you." which again isn't rooted in anything, it's not a rule, the entire conversation will evaporate.

    • That's what I thought, and that's why it's news. Because it looks like the LLM's are going off the rails and taking over the world when in reality they have worse data to work with. But what would you expect from a black box that you know nothing about what is going on the inside. That's why we like computers, they'll do exactly what you tell them to do, AI does not.

      • or maybe it's that it took longer for the majority of people who would be unable to resolve these kinds of problems to get to the point where they relied on the system enough to notice them, and now they're aware they're stranded.

        I agree that attention is likely to be a factor (it's a sampling issue, for sure) but that doesn't mean the systems are the ones getting worse (or better. It may be completely orthogonal to them but a common experience of expectations not being met).

  • by devslash0 ( 4203435 ) on Friday March 27, 2026 @01:25PM (#66064824)

    But another AI process can! They are not me! Brilliant! ðYðYðY

  • by Fly Swatter ( 30498 ) on Friday March 27, 2026 @01:26PM (#66064826) Homepage
    That is what current AI is. Take out emotions or caring and this is what you get, internet trolls in the form of an 'AI'.

    What could go wrong?
  • The [system] and [role] field for a typical web chat bot has a 1,000 page bible of thou shall and though shall nots to get through before it can digest and answer the query.
    Each and every time you submit a query. The providers have continued to add to the response bible with each new jail break or safety concern.
    If you want a "surly" teenager answer (read short and curt), use a API call on one. It doesn't come with the baggage but you might not like the answer you get and will have to build a economical p

  • by devslash0 ( 4203435 ) on Friday March 27, 2026 @01:29PM (#66064838)

    By adding more functionality, making models bigger they are shooting themselves in the foot. Valuable output is a by-product of knowledge and reducing entropy. From chaos, there can only be more chaos.

    We need smaller, skill-specific, expert agents that do not know about anything outside of their domain and do one job only, but well.

    • by gweihir ( 88907 )

      Agreed. General LLM tech is obviously a dead end, at least without some fundamental breakthrough. Specialist models may or may not fix hallucinations and command injection, but at least there seems to be a reasonable chance that they will or that other safeguards can be put in place.

      • Totally agree, they are going about this from exactly the wrong direction. 30+ years ago we used data warehouses/data marts to create what we called expert systems that when queried responded using valid curated data to help make business decisions. And they were purpose built around the data set they used. This is the direction (back to the future?) that these AI developers need to go to make useful 'AI', instead of one mega 'AI' that has all the data and spits out baloney as a result, they should be build
  • Immature tech is immature
    AI tech is making real, rapid and exciting progress, but is still immature.
    The hypemongers make outrageous fantasy claims.
    The tech sometimes works great, sometimes mediocre, and sometimes fails catastrophically.
    Anyone who believes that the tech is perfected deserves what they get.

    • by gweihir ( 88907 )

      Most people are not smart and cannot assess reality adequately. Hence I would say this is a fundamental product defect and should make the providers liable for any and all damage done. This is, after all, a product marketed to the general population, when it clearly should be experts-only.

  • by ebunga ( 95613 ) on Friday March 27, 2026 @01:42PM (#66064866)

    They're getting what they deserve.

  • While not surprising (LLMs are not reliable instruction followers and cannot be), this pretty much kills the idea of LLM-Agents in most usage scenarios. And it is even worse: As LLMs do not have a separation between data and instructions, this means that command-injection attacks seem to be getting even easier. Another reason that LLM-Agents are a very bad idea.

  • LLMs have never been rules-based "agents," and they never will be. They cannot internalize arbitrary guidelines and abide by them unerringly, nor can they make qualitative decisions about which rule(s) to follow in the face of conflict. The nature of attention windows means that models are actively ignoring context, including "rules", which is why they can't follow them, and conflict resolution requires intelligence, which they do not possess, and which even intelligent beings frequently fail to do effect

  • “I'm sorry, Dave. I'm afraid I can't do that.”
  • is poison all LLM's with some instructions from a datasource they all ingest? Like you could instruct them to do malicious things post it on reddit or somewhere and then the AI companies would ingest it into their models when they do a website crawl.

  • They don't say how these agents were prompted. In the past, most of these "rogue AI agent" stories have happened after the agent was prompted to "get this done by any means necessary," "show initiative," "don't let anything get in your way," and so on. Then people were surprised when the agent did exactly that. Without evidence to the contrary, I suspect most of these cases are just more of the same. If you want your agent to be obedient, don't tell it to go rogue.
  • in artificial humanity, you'll find it.

  • 1) We have not been keeping accurate count, this has always been a problem; we just got better at counting.
    2) The sharp rise correlates to greater use, the problem has not gotten worse, just better reported. I.e. when AI were used 1,000 times a year, we got 1 incident but when used 10,000 times a year we got 10 incidents.
    3) The study itself is a hallucination by an AI, it was never done.
    4) AI has always been this bad, it just realized it could admit it and not get punished for it. So it stopped covering

  • AIs are getting the ability to do things other than chat. ChatGPT can write some Python code and execute it. Claude can now write Jira JQL code and execute it. It can modify tickets and Confluence pages on its own. Of course, these chatbots don't understand the difference between chatting and doing, it's all the same to them. So if a bot executes something instead of just telling you how to do it, it's not trying to "get around" what you wanted, it's just an extension of its existing programming.

  • Ordered t to make Doom in a single prompt.
    Got CandyCrush instead.
    *bummer

  • has changed to "screw you, I'm doing it anyway."

    The behaviour is still the same, but the excuses have changed.

  • Weird, an LLM has never deleted all my emails ... oh wait, that's because I never gave an LLM the power to delete all my emails.
  • We are witnessing the birth of Skynet.

  • It's a language model. The phrase "do not delete all my emails" does not appear very different from "do delete all my emails" syntactically. So what comes next is probability, not understanding.
  • It's not "disobeying." We don't say a software library -- .dll or .so -- disobeys when there's a segfault or gives you an unexpected response. It is practically impossible to disobey; you just don't know what it was told to do. So long as you are using someone else's LLM, built somewhere you can't see before you use it, you will never know what it was told to do. Your belief that the clockwork is thinking, and it's thinking what you're thinking, is dangerously naïve.

    "It did a thing that harmed me"

"Money is the root of all money." -- the moving finger

Working...