When Agents go Astray: Course-Correcting SWE Agents with PRMs

📅 2025-09-02

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

LLM agents frequently exhibit inefficient reasoning trajectories in complex software engineering (SWE) tasks due to redundant exploration, cyclic execution, and premature or delayed termination. Existing approaches predominantly rely on post-hoc error diagnosis and lack real-time intervention capabilities. To address this, we propose a lightweight Process Reward Model (PRM) grounded in a fine-grained error taxonomy, which dynamically detects and corrects trajectory-level errors during inference—enabling online intervention without modifying the policy network. Evaluated on the SWE-bench Verified benchmark, our method improves task success rate from 40.0% to 50.6% under a closed-source PRM setting, significantly reduces average trajectory length, and incurs only $0.20 per intervention. The approach demonstrates high interpretability, strong robustness across diverse error types, and favorable scalability to larger models and tasks.

Technology Category

Application Category

📝 Abstract

Large Language Model (LLM) agents are increasingly deployed for complex, multi-step software engineering (SWE) tasks. However, their trajectories often contain costly inefficiencies, such as redundant exploration, looping, and failure to terminate once a solution is reached. Prior work has largely treated these errors in a post-hoc manner, diagnosing failures only after execution. In this paper, we introduce SWE-PRM, an inference-time Process Reward Model (PRM) that intervenes during execution to detect and course-correct trajectory-level errors. Our PRM design leverages a taxonomy of common inefficiencies and delivers lightweight, interpretable feedback without modifying the underlying policy. On SWE-bench Verified, closed-source PRMs improve resolution from 40.0% to 50.6% (+10.6 p.p.), with the largest gains on medium and hard tasks. Among feedback strategies, taxonomy-guided PRMs outperform unguided or explicit action-prescriptive variants, increasing success rate while reducing trajectory length. These benefits come at an acceptable added inference cost of as low as $0.2, making PRMs a practical and scalable mechanism for improving SWE agents' reliability and efficiency.

Problem

Research questions and friction points this paper is trying to address.

Detecting and correcting inefficiencies in LLM agent trajectories

Addressing redundant exploration and failure to terminate issues

Improving software engineering task success rates with PRMs

Innovation

Methods, ideas, or system contributions that make the work stand out.

Inference-time Process Reward Model intervention

Lightweight interpretable feedback without policy modification

Taxonomy-guided error detection and course-correction

🔎 Similar Papers

No similar papers found.