That was almost exactly my reaction to the story, but I think you should have gone for funny with it. I'm also not sure you should have called it an "agent", however. So I have supporting anecdotes to share from my AI-supported "programming" experiences...
My early experiments were mostly with ChatGPT and DeepSeek. My website was getting sick and the PERL/CGI was no longer allowed to run, so one of the many upgrade paths I explored involved moving functions, mostly statistical stuff, from PERL to JavaScript. This is when I started encountering the lost marbles problems. The first few iterations would work surprisingly well, but then it would start losing its marbles and various features would disappear, seemingly at random. Not sure I could figure out when this was going on since time seems so distorted these years, but I feel like it was around two years back.
More recently the server died completely. (Tripod's parent company Lycos was supposedly quite valuable a long time ago, though not nearly as valuable as the AI companies are supposed to be these days--but that's a fresh bubble waiting to burst.) So I wound up using the quasi-website aspect of GitHub to host my quasi-website. (Not quite trivial to modify the old JavaScript utilities for the new URLs.) I also decided to take another swing at the bigger problems, this time using Claude. Color my surprised or even amazed? Much more productive this time around. My "work" pattern this time involves short sessions, basically discussing features and data structures, followed by a minute or two of file generation by Claude, a couple of minutes of file installation, and then some testing. Pretty quickly matched and went beyond the existing PERL code, including apparently fixing a regex problem that had eluded me for a long time.
At that point I started worrying about Claude losing its marbles. The AI is quite willing to discuss the problem in terms of tokens, but it refused to give any hard limits and apparently has no way to assess if a "session" is close to reaching any of them. However it definitely described behaviors that sounded like losing marbles and was unable to suggest any good ways to detect such problems. And I think that is probably what happened in this story. Someone was updating code using an AI and at some point it passed its limits and started losing features. Who knows what else has gone missing?
Claude does have some meta-features for managing tokens, including compression, but it was not too helpful about assessing the risks. Instead it suggested starting a fresh session and prepared an interesting "transition" document that is supposed to describe the current state of the new system... But the threats of lost marbles remain and the threats sound quite similar to what seems to have happened in Outlook in this story... I feel like Claude's threats are only implicit because it won't clarify what they are or how to detect them...
(Just about finished with Microsoft Secrets about their software development processes a long time ago. Testing problems were prevalent and never really solved...)