🤖 AI Summary
Existing LLM-based automated program repair (APR) methods rely on trial-and-error local search, suffering from premature convergence to suboptimal patches and low efficiency. This paper introduces Monte Carlo Tree Search (MCTS) into APR for the first time, proposing an iterative tree-search framework that synergistically integrates LLMs’ patch-generation capability with MCTS’s global evaluation and path-optimization mechanism. Our approach operates over a compact candidate patch set (16–32 patches), drastically reducing computational overhead. Evaluated on 835 Defects4J bugs, our method—when integrated with GPT-3.5—repairs 201 defects, outperforming all prior state-of-the-art APR techniques. It repairs 27–37 more bugs than baseline methods, while reducing runtime and API call costs to under 20% and 50% of those baselines, respectively; gains are especially pronounced on complex, multi-hunk defects.
📝 Abstract
Automated Program Repair (APR) attempts to fix software bugs without human intervention, which plays a crucial role in software development and maintenance. Recently, with the advances in Large Language Models (LLMs), a rapidly increasing number of APR techniques have been proposed with remarkable performance. However, existing LLM-based APR techniques typically adopt trial-and-error strategies, which suffer from two major drawbacks: (1) inherently limited patch effectiveness due to local exploration, and (2) low search efficiency due to redundant exploration. In this paper, we propose APRMCTS, which uses iterative tree search to improve LLM-based APR. APRMCTS incorporates Monte Carlo Tree Search (MCTS) into patch searching by performing a global evaluation of the explored patches and selecting the most promising one for subsequent refinement and generation. APRMCTS effectively resolves the problems of falling into local optima and thus helps improve the efficiency of patch searching. Our experiments on 835 bugs from Defects4J demonstrate that, when integrated with GPT-3.5, APRMCTS can fix a total of 201 bugs, which outperforms all state-of-the-art baselines. Besides, APRMCTS helps GPT-4o-mini, GPT-3.5, Yi-Coder-9B, and Qwen2.5-Coder-7B to fix 30, 27, 37, and 28 more bugs, respectively. More importantly, APRMCTS boasts a significant performance advantage while employing small patch size (16 and 32), notably fewer than the 500 and 10,000 patches adopted in previous studies. In terms of cost, compared to existing state-of-the-art LLM-based APR methods, APRMCTS has time and monetary costs of less than 20% and 50%, respectively. Our extensive study demonstrates that APRMCTS exhibits good effectiveness and efficiency, with particular advantages in addressing complex bugs.