Crash Report Enhancement with Large Language Models: An Empirical Study

📅 2025-09-16

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Crash reports often lack precise fault localization, root-cause explanations, and actionable repair suggestions, severely hindering debugging efficiency. To address this, we propose Agentic-LLM, an iterative large language model (LLM) framework that synergistically integrates stack traces with repository-level code context to perform multi-step reasoning for evidence retrieval and diagnostic generation. Unlike direct LLM-based one-shot generation baselines, our approach incorporates dual evaluation via LLM-as-a-judge and CodeBLEU metrics, validated through rigorous human assessment. Evaluated on 492 real-world crash reports, Agentic-LLM achieves a Top-1 fault localization accuracy of 40.2%–43.1%, a substantial improvement over the baseline’s 10.6%. Its generated repair suggestions attain CodeBLEU scores of 56–57, and user studies confirm significant gains in comprehensibility and repairability. This work pioneers the systematic application of embodied, agent-style LLM reasoning to crash diagnosis, establishing a novel paradigm for intelligent software maintenance.

Technology Category

Application Category

📝 Abstract

Crash reports are central to software maintenance, yet many lack the diagnostic detail developers need to debug efficiently. We examine whether large language models can enhance crash reports by adding fault locations, root-cause explanations, and repair suggestions. We study two enhancement strategies: Direct-LLM, a single-shot approach that uses stack-trace context, and Agentic-LLM, an iterative approach that explores the repository for additional evidence. On a dataset of 492 real-world crash reports, LLM-enhanced reports improve Top-1 problem-localization accuracy from 10.6% (original reports) to 40.2-43.1%, and produce suggested fixes that closely resemble developer patches (CodeBLEU around 56-57%). Both our manual evaluations and LLM-as-a-judge assessment show that Agentic-LLM delivers stronger root-cause explanations and more actionable repair guidance. A user study with 16 participants further confirms that enhanced reports make crashes easier to understand and resolve, with the largest improvement in repair guidance. These results indicate that supplying LLMs with stack traces and repository code yields enhanced crash reports that are substantially more useful for debugging.

Problem

Research questions and friction points this paper is trying to address.

Enhancing crash reports with diagnostic details for debugging

Improving problem-localization accuracy and repair suggestions using LLMs

Evaluating LLM strategies for actionable crash report enhancements

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLM-enhanced crash reports with stack traces

Agentic-LLM iteratively explores repository evidence

Direct-LLM uses single-shot stack-trace context

🔎 Similar Papers

ScVLM: Enhancing Vision-Language Model for Safety-Critical Event Understanding