I've been trying for a while to use a "loop" to optimize one particularly-tedious part of my workflow: Merging.
My employer uses Github with an extensive CI infrastructure to validate all sorts of things. After CI passes, trunk-io takes the commit and retests it in a batch with other commits and if they all pass, merges them as a set of squash commits. If something goes wrong, I have to figure out whether it's a transient failure (in which case I can tell the system to re-run the tests), or whether it requires me to fix and re-push. My commits typically build on one another so I end up with a stack of PRs that have to go through this process. When a commit finally merges the next commit up the stack has to be rebased and re-pushed.
Start to end, getting a commit to merge takes between one and four hours. This is slow enough that even though I don't have to watch the process continuously, just check in on it every half hour or so, it puts a major crimp in my productivity. If I only merge during working hours I can only merge 2-4 commits per day, but on a good day I create double that. This means that I have to be merging evenings and weekends too, or my backlog builds up. (Code review is another obstacle, but I'm focused only on the merge process here.)
There are enough possible odd failure cases in the merge process that I haven't been successful at writing a script to manage it. So I thought "Hey, why not have Claude supervise it? Claude is capable of exercising some judgment and problem-solving, right?".
Not really. If there's a problem blocking the PR at the bottom of the stack from merging, Claude is perfectly capable of analyzing the situation and determining what needs to be done to unblock it, and of performing the operations necessary -- but only with active prompting. Claude can set a timer to go periodically check the status and recognize the problem, but no matter what I do I can't get it to autonomously take the next step of correctly diagnosing and then acting on that diagnosis. Even given explicit instructions to do so, Claude either (a) fails to investigate enough, (b) fails to identify correct actions or (c) fails to perform them. When I wake up in the morning and ask Claude what the situation is, it generally correctly and accurately summarizes exactly what's wrong and exactly what needs to be done to fix it, and then when I ask why it didn't do those things it tells me that it clearly should have, but it just didn't.
I've tried various architectures, using one instance to prompt another one, using pairs of instances set up with distinct, complementary responsibilities, using instances set up with adversarial responsibilities (this is the most effective), but I just can't get it do to this work effectively.