Resolving Crash Bugs via Large Language Models: An Empirical Study

📅 2023-12-16

🏛️ arXiv.org

📈 Citations: 8

✨ Influential: 0

career value

181K/year

🤖 AI Summary

This work addresses *environment-related crashes*—a class of software failures (e.g., dependency conflicts, misconfigurations) hitherto lacking systematic empirical study in real-world systems—and presents the first end-to-end empirical evaluation of large language models’ (LLMs) capability to diagnose and repair such failures. We propose **IntDiagSolver**, an interactive diagnosis framework that integrates environment factor decomposition, multi-turn active querying, and self-planning prompt engineering, enabling LLMs to autonomously design diagnostic steps and iteratively narrow down root causes. Evaluated on ChatGPT, Claude, and CodeLlama, IntDiagSolver achieves a +32.7% average improvement in root-cause localization accuracy for environment-related crashes over baseline methods. Moreover, it consistently enhances overall repair success rates for both environment-related and code-related crashes. These results demonstrate the feasibility and effectiveness of interactive LLM-based diagnosis for industrial-grade defect resolution.

📝 Abstract

Crash bugs cause unexpected program behaviors or even termination, requiring high-priority resolution. However, manually resolving crash bugs is challenging and labor-intensive, and researchers have proposed various techniques for their automated localization and repair. ChatGPT, a recent large language model (LLM), has garnered significant attention due to its exceptional performance across various domains. This work performs the first investigation into ChatGPT's capability in resolve real-world crash bugs, focusing on its effectiveness in both localizing and repairing code-related and environment-related crash bugs. Specifically, we initially assess ChatGPT's fundamental ability to resolve crash bugs with basic prompts in a single iteration. We observe that ChatGPT performs better at resolving code-related crash bugs compared to environment-related ones, and its primary challenge in resolution lies in inaccurate localization. Additionally, we explore ChatGPT's potential with various advanced prompts. Furthermore, by stimulating ChatGPT's self-planning, it methodically investigates each potential crash-causing environmental factor through proactive inquiry, ultimately identifying the root cause of the crash. Based on our findings, we propose IntDiagSolver, an interaction methodology designed to facilitate precise crash bug resolution through continuous interaction with LLMs. Evaluating IntDiagSolver on multiple LLMs reveals consistent enhancement in the accuracy of crash bug resolution, including ChatGPT, Claude, and CodeLlama.

Problem

Research questions and friction points this paper is trying to address.

Assessing LLMs' capability to resolve environment-related software crash bugs

Investigating prompt strategies for improving environment-related crash bug resolution

Proposing an interactive methodology for precise crash bug localization and repair

Innovation

Methods, ideas, or system contributions that make the work stand out.

Interactive methodology IntDiagSolver for crash resolution

Active inquiry prompting leveraging LLM self-planning capabilities

Multi-round engagement strategy with diverse prompt templates

🔎 Similar Papers

A Systematic Literature Review on Large Language Models for Automated Program Repair