🤖 AI Summary
This work addresses the limitation of existing large language model–based automated program repair approaches, which rely on end-to-end test feedback and struggle to precisely identify internal logical deviations. To overcome this, the authors propose SpecTune, a framework that inserts checkpoints along execution paths to generate localized postconditions and evaluates intermediate program behaviors against dynamic execution results, thereby providing fine-grained debugging signals. SpecTune introduces an intermediate behavior reasoning mechanism and designs two key signals—a specification validation signal (α) and a discriminative signal (β)—to substantially enhance the reliability of automatically generated specifications and the precision of repairs. Experimental results demonstrate that SpecTune significantly outperforms current baseline methods in both fault localization accuracy and repair success rate.
📝 Abstract
Automated Program Repair (APR) has recently benefited from large language models (LLMs). However, most LLM-based APR approaches still rely primarily on coarse end-to-end signals from test-suite outcomes to guide repair, providing limited insight into where a program's internal logic deviates from its intended behavior. In contrast, human debugging often relies on intermediate reasoning about program states through localized correctness conditions or assertions. Inspired by this observation, we propose SpecTune, a specification-guided debugging framework that incorporates intermediate behavioral reasoning into APR. SpecTune decomposes the repair task into suspicious regions connected by execution checkpoints and derives localized postconditions representing expected program behaviors at those points. By executing the buggy program and evaluating these postconditions, SpecTune produces micro-level debugging signals that indicate mismatches between observed and intended behaviors, enabling more precise fault localization and targeted patch generation. To address the potential unreliability of LLM-generated postconditions, we introduce two complementary signals: a specification validation signal alpha, which estimates the consistency of generated postconditions using partially passing test cases, and a discriminative signal beta, which detects violations of validated postconditions during execution. With these signals, SpecTune safely leverages automatically generated specifications for APR. Experimental results show that SpecTune improves fault localization and APR effectiveness than the baselines.