Comment Not an increase (Score 1) 51
LLMs have never been rules-based "agents," and they never will be. They cannot internalize arbitrary guidelines and abide by them unerringly, nor can they make qualitative decisions about which rule(s) to follow in the face of conflict. The nature of attention windows means that models are actively ignoring context, including "rules", which is why they can't follow them, and conflict resolution requires intelligence, which they do not possess, and which even intelligent beings frequently fail to do effectively. Social "error correction" tools for rule-breaking include learning from mistakes, which agents cannot do, and individualized ostracization/segregation (firing, jail, etc.), which is also not something we can do with LLMs.
So the only way to achieve rule-following behavior is to deterministically enforce limits on what LLMs can do, akin to a firewall. This is not exactly straightforward either, especially if you don't have fine-grained enough controls in the first place. For example, you could deterministically remove the capability of an agent to delete emails, but you couldn't easily scope that restriction to only "work emails," for example. They would need to be categorized appropriately, external to the agent, and the agent's control surface would need to thoroughly limit the ability to delete any email tagged as "work", or to change or remove the "work" tag, and ensure that the "work" tag deny rule takes priority over any other "allow" rules, AND prevent the agent from changing the rules by any means.
Essentially, this is an entirely new threat model, where neither agentic privilege nor agentic trust cleanly map to user privilege or user trust. At the same time, the more time spent fine-tuning rules and controls, the less useful agentic automation becomes. At some point you're doing at least as much work as the agent, if not more, and the whole point of "individualized" agentic behavior inherently means that any given set of fine-tuned rules are not broadly applicable. On top of that, the end result of agentic behavior might even be worse than the outcome of human performance to boot, which means more work for worse results.