Coding Agents Don't Know When to Act

📅 2026-05-08
📈 Citations: 0
Influential: 0
📄 PDF

career value

208K/year
🤖 AI Summary
This study addresses a critical yet overlooked issue in AI-driven code generation: coding agents frequently introduce redundant changes when confronted with already-fixed bug reports, mistaking them as requiring modification and thereby accruing technical debt. The work presents the first systematic characterization and quantification of this “action bias,” introducing FixedBench—a benchmark designed to evaluate agents’ decision-making in scenarios where no code change is necessary. Through human-validated tasks, a multi-agent evaluation framework, and instruction fine-tuning, the authors find that state-of-the-art large language models still produce inappropriate code modifications in 35%–65% of such cases. While failure-reproduction guidance partially mitigates this bias, it simultaneously triggers new failure modes. The paper proposes a novel paradigm that explicitly models “inaction” as a valid and successful outcome, offering a crucial direction for aligning agent behavior with developer intent.
📝 Abstract
Coding agents are increasingly deployed to autonomously maintain software, including to resolve user-reported issues: a bug report comes in and the agent creates a patch to address it. However, in any real-world deployment, they will encounter stale bug reports about issues that have already been resolved. Agents should recognize this and abstain from modifying the code to avoid accumulating technical debt. To systematically evaluate whether current agents do so, we introduce FixedBench, a code benchmark with 200 human-verified coding tasks in which no code changes are required, testing five recent models across four agent harnesses. We find that even state-of-the-art models fail, proposing undesirable changes (excluding tests and documentation) in $35$ to $65\%$ of cases. Explicit instructions to reproduce the issue before patching partially address this issue but introduce a new failure mode: when an issue is partially fixed, they abstain even though a patch would still be needed. More broadly, our results indicate that LLMs fall prey to an action bias: they choose to act even if inaction would be appropriate. To break this pattern, inaction needs to be explicitly framed as a path to success, which highlights an overreliance on human guidance implicit in current training objectives.
Problem

Research questions and friction points this paper is trying to address.

coding agents
action bias
stale bug reports
technical debt
inaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

action bias
FixedBench
code agents
technical debt
inaction as success
🔎 Similar Papers