🤖 AI Summary
In automated program repair (APR), developers must manually validate patch correctness, yet “plausibly correct” overfitted patches frequently cause misjudgments, incurring high verification overhead. To address this, we propose iFix—a novel interactive runtime comparison framework that introduces the first dynamic alignment paradigm for patch comparison based on critical variables. iFix combines static analysis to identify sensitive variables with dynamic instrumentation to capture aligned runtime values across multiple patch executions, and supports visualized differential provenance via an intuitive interface. Unlike prior approaches, iFix is tool-agnostic and generalizes across diverse APR systems. Evaluation shows that iFix improves the average rank of correct patches by 39%, reduces developer debugging time by 36%, increases validation confidence by 50%, and significantly outperforms existing baseline methods in patch ranking accuracy.
📝 Abstract
Automated Program Repair (APR) holds the promise of alleviating the burden of debugging and fixing software bugs. Despite this, developers still need to manually inspect each patch to confirm its correctness, which is tedious and time-consuming. This challenge is exacerbated in the presence of plausible patches, which accidentally pass test cases but may not correctly fix the bug. To address this challenge, we propose an interactive approach called iFix to facilitate patch understanding and comparison based on their runtime difference. iFix performs static analysis to identify runtime variables related to the buggy statement and captures their runtime values during execution for each patch. These values are then aligned across different patch candidates, allowing users to compare and contrast their runtime behavior. To evaluate iFix, we conducted a within-subjects user study with 28 participants. Compared with manual inspection and a state-of-the-art interactive patch filtering technique, iFix reduced participants' task completion time by 36% and 33% while also improving their confidence by 50% and 20%, respectively. Besides, quantitative experiments demonstrate that iFix improves the ranking of correct patches by at least 39% compared with other patch ranking methods and is generalizable to different APR tools.