Finding the Needle in the Crash Stack: Industrial-Scale Crash Root Cause Localization with AutoCrashFL

📅 2025-10-26

📈 Citations: 0

✨ Influential: 0

career value

187K/year

🤖 AI Summary

To address the scalability challenge in root-cause localization for crashes in large-scale industrial software—where conventional approaches rely on expensive dynamic coverage instrumentation—this paper proposes AutoCrashFL, an LLM-based intelligent fault localization agent. AutoCrashFL performs end-to-end root-cause localization using only crash logs and source code repositories, without requiring dynamic execution or coverage collection. It jointly models the semantic information in stack traces and the structural context of source code. Its key innovations include the first systematic application of the LLM agent paradigm to industrial-scale crash analysis and the integration of a confidence-aware evaluation mechanism. Evaluated on SAP HANA—a 35-million-line industrial system—AutoCrashFL achieves a Top-1 accuracy of 30%, substantially outperforming baseline methods (17%) and demonstrating superior robustness against complex, multi-layer call-chain defects.

Technology Category

Application Category

📝 Abstract

Fault Localization (FL) aims to identify root causes of program failures. FL typically targets failures observed from test executions, and as such, often involves dynamic analyses to improve accuracy, such as coverage profiling or mutation testing. However, for large industrial software, measuring coverage for every execution is prohibitively expensive, making the use of such techniques difficult. To address these issues and apply FL in an industrial setting, this paper proposes AutoCrashFL, an LLM agent for the localization of crashes that only requires the crashdump from the Program Under Test (PUT) and access to the repository of the corresponding source code. We evaluate AutoCrashFL against real-world crashes of SAP HANA, an industrial software project consisting of more than 35 million lines of code. Experiments reveal that AutoCrashFL is more effective in localization, as it identified 30% crashes at the top, compared to 17% achieved by the baseline. Through thorough analysis, we find that AutoCrashFL has attractive practical properties: it is relatively more effective for complex bugs, and it can indicate confidence in its results. Overall, these results show the practicality of LLM agent deployment on an industrial scale.

Problem

Research questions and friction points this paper is trying to address.

Localizing crash root causes in large industrial software systems

Reducing expensive dynamic analysis costs for fault localization

Automating crash diagnosis using only crash dumps and source code

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses LLM agent for crash root cause localization

Requires only crashdump and source code repository

Achieves higher effectiveness than baseline on industrial software

🔎 Similar Papers

Leveraging Stack Traces for Spectrum-based Fault Localization in the Absence of Failing Tests