🤖 AI Summary
Traditional software fault analysis relies on manual expert effort for fault identification, filtering, and root-cause investigation—yielding low efficiency and poor scalability for large-scale empirical studies. Method: This paper pioneers the systematic application of large language models (LLMs) to empirical software engineering, proposing a three-stage automation framework: objective definition, data preparation, and automated analysis. Using a high-quality empirical dataset, we perform end-to-end automatic classification and root-cause attribution on 3,829 open-source software faults. Contribution/Results: The full pipeline completes in an average of two hours—accelerating analysis by two orders of magnitude over manual approaches requiring weeks. This dramatically improves research scalability and iteration speed. Our study empirically validates LLMs’ feasibility for complex, real-world software fault analysis and establishes a novel paradigm for automating empirical software engineering research.
📝 Abstract
Understanding software faults is essential for empirical research in software development and maintenance. However, traditional fault analysis, while valuable, typically involves multiple expert-driven steps such as collecting potential faults, filtering, and manual investigation. These processes are both labor-intensive and time-consuming, creating bottlenecks that hinder large-scale fault studies in complex yet critical software systems and slow the pace of iterative empirical research.
In this paper, we decompose the process of empirical software fault study into three key phases: (1) research objective definition, (2) data preparation, and (3) fault analysis, and we conduct an initial exploration study of applying Large Language Models (LLMs) for fault analysis of open-source software. Specifically, we perform the evaluation on 3,829 software faults drawn from a high-quality empirical study. Our results show that LLMs can substantially improve efficiency in fault analysis, with an average processing time of about two hours, compared to the weeks of manual effort typically required. We conclude by outlining a detailed research plan that highlights both the potential of LLMs for advancing empirical fault studies and the open challenges that required be addressed to achieve fully automated, end-to-end software fault analysis.