🤖 AI Summary
To address the high computational complexity and low efficiency of System 2 reasoning in large language model (LLM)-based code debugging, this paper proposes Scaffold-Reasoning—a dual-process theory–inspired scaffolding reasoning framework. It comprises three synergistic modules: Scaffold (structural guidance), Analytic (fine-grained error attribution analysis), and Integration (multi-stream logical fusion). By mapping cognitive psychology’s dual-system theory into a computationally tractable reasoning architecture, our approach balances reasoning depth and efficiency. Methodologically, it employs stepwise reference generation, precise error localization, and multi-path ensemble inference. Evaluated on the DebugBench benchmark, Scaffold-Reasoning achieves an 88.91% pass rate with an average solving time of only 5.36 seconds per problem—significantly outperforming state-of-the-art methods. These results empirically validate the effectiveness and practicality of cognition-aligned design for LLM-based code debugging.
📝 Abstract
Recent LLMs have demonstrated sophisticated problem-solving capabilities on various benchmarks through advanced reasoning algorithms. However, the key research question of identifying reasoning steps that balance complexity and computational efficiency remains unsolved. Recent research has increasingly drawn upon psychological theories to explore strategies for optimizing cognitive pathways. The LLM's final outputs and intermediate steps are regarded as System 1 and System 2, respectively. However, an in-depth exploration of the System 2 reasoning is still lacking. Therefore, we propose a novel psychologically backed Scaffold Reasoning framework for code debugging, which encompasses the Scaffold Stream, Analytic Stream, and Integration Stream. The construction of reference code within the Scaffold Stream is integrated with the buggy code analysis results produced by the Analytic Stream through the Integration Stream. Our framework achieves an 88.91% pass rate and an average inference time of 5.36 seconds per-problem on DebugBench, outperforming other reasoning approaches across various LLMs in both reasoning accuracy and efficiency. Further analyses elucidate the advantages and limitations of various cognitive pathways across varying problem difficulties and bug types. Our findings also corroborate the alignment of the proposed Scaffold Reasoning framework with human cognitive processes.