HAFix: History-Augmented Large Language Models for Bug Fixing

📅 2025-01-15

📈 Citations: 0

✨ Influential: 0

career value

183K/year

🤖 AI Summary

Existing large language models (LLMs) for defect repair neglect historical evolution information from software repositories, limiting both accuracy and generalizability. To address this, we propose HAFix, a history-augmented repair framework featuring the novel History-Aware Heuristic Aggregation mechanism (HAFix-Agg), which systematically models seven real-world project history signals—including fix patterns and commit contexts—and introduces FLN-all, a comprehensive heuristic feature representation. We further identify, for the first time, the superiority of instruction-style prompting in historical-context-aware repair and conduct a rigorous three-dimensional trade-off analysis across performance, cost, and timeliness. Evaluated on 51 single-line defects using an instruction-tuned Code Llama, HAFix-Agg achieves a 45% higher repair rate than GitHub Copilot, while FLN-all improves accuracy by 10%, significantly outperforming state-of-the-art prompting templates.

Technology Category

Application Category

📝 Abstract

Recent studies have explored the performance of Large Language Models (LLMs) on various Software Engineering (SE) tasks, such as code generation and bug fixing. However, these approaches typically rely on the context data from the current snapshot of the project, overlooking the potential of rich historical data from real-world software repositories. Additionally, the impact of prompt styles on LLM performance within a historical context remains underexplored. To address these gaps, we propose HAFix, which stands for History-Augmented LLMs on Bug Fixing, a novel approach that leverages individual historical heuristics associated with bugs and aggregates the results of these heuristics (HAFix-Agg) to enhance LLMs' bug-fixing capabilities. To empirically evaluate HAFix, we employ Code Llama on a dataset of 51 single-line bugs, sourced from 11 open-source projects, by mining the historical context data of bugs and operationalizing this context in the form of seven heuristics. Our evaluation demonstrates that historical heuristics significantly enhance bug-fixing performance. For example, the FLN-all heuristic achieves a 10% improvement in performance compared to a non-historical baseline inspired by GitHub Copilot. Furthermore, HAFix-Agg fixes 45% more bugs than the baseline, outperforming FLN-all and demonstrating the best performance overall. Moreover, within the context of historical heuristics, we identify the Instruction style prompt as the most effective template for LLMs in bug fixing. Finally, we provide a pragmatic trade-off analysis of bug-fixing performance, cost, and time efficiency, offering valuable insights for the practical deployment of our approach in real-world scenarios.

Problem

Research questions and friction points this paper is trying to address.

Large Language Models

Software Bug Repair

Historical Data Utilization

Innovation

Methods, ideas, or system contributions that make the work stand out.

HAFix

Historical Data Integration

Enhanced Bug Repair Strategies

🔎 Similar Papers

A Systematic Literature Review on Large Language Models for Automated Program Repair