š¤ AI Summary
This paper addresses real-world software engineering challengesāsuch as bug fixing and performance optimizationāthat demand long-horizon reasoning, iterative exploration, and feedback-driven decision-making by LLM-based agents. Methodologically, it synthesizes 126 recent studies, integrating ReAct and Plan-and-Execute architectures, multi-step tool invocation, interactive code-environment execution, and benchmarks including SWE-bench. Its primary contribution is the first domain-specific, three-dimensional taxonomy of agent capabilities for software engineeringāspanning benchmarks, techniques, and empirical evaluationāwhich reveals a paradigm shift toward reinforcement learningādriven agent design. The study maps the fieldās evolutionary trajectory, identifies critical bottlenecksāincluding sparse environmental feedback and low planning interpretabilityāand proposes six concrete future directions: scalable training, simulation-augmented learning, human-in-the-loop collaboration, modular agent design, benchmark standardization, and causal reasoning integration.
š Abstract
Software issue resolution aims to address real-world issues in software repositories (e.g., bug fixing and efficiency optimization) based on natural language descriptions provided by users, representing a key aspect of software maintenance. With the rapid development of large language models (LLMs) in reasoning and generative capabilities, LLM-based approaches have made significant progress in automated software issue resolution. However, real-world software issue resolution is inherently complex and requires long-horizon reasoning, iterative exploration, and feedback-driven decision making, which demand agentic capabilities beyond conventional single-step approaches. Recently, LLM-based agentic systems have become mainstream for software issue resolution. Advancements in agentic software issue resolution not only greatly enhance software maintenance efficiency and quality but also provide a realistic environment for validating agentic systems' reasoning, planning, and execution capabilities, bridging artificial intelligence and software engineering.
This work presents a systematic survey of 126 recent studies at the forefront of LLM-based agentic software issue resolution research. It outlines the general workflow of the task and establishes a taxonomy across three dimensions: benchmarks, techniques, and empirical studies. Furthermore, it highlights how the emergence of agentic reinforcement learning has brought a paradigm shift in the design and training of agentic systems for software engineering. Finally, it summarizes key challenges and outlines promising directions for future research.