Fault Localisation and Repair for DL Systems: An Empirical Study with LLMs

📅 2025-06-03

📈 Citations: 0

✨ Influential: 0

career value

209K/year

🤖 AI Summary

Existing deep learning (DL) system fault localization (FL) and repair techniques rely on predefined rules and single-metric evaluation criteria, limiting their practical effectiveness and generalizability. Method: This paper presents the first end-to-end, large language model (LLM)-driven paradigm for DL FL and repair. It deeply integrates state-of-the-art LLMs (e.g., GPT-4) into the entire workflow, augmented by domain-specific prompt engineering and a dedicated DL benchmarking framework. Crucially, it introduces a multi-ground-truth patch evaluation mechanism to overcome the limitations of rule-based validation. Contribution/Results: Extensive experiments demonstrate that GPT-4 achieves 44% and 82% improvements over the best prior tools in fault localization and code-level repair, respectively—empirically validating the superiority and feasibility of LLMs in DL system debugging.

Technology Category

Application Category

📝 Abstract

Numerous Fault Localisation (FL) and repair techniques have been proposed to address faults in Deep Learning (DL) models. However, their effectiveness in practical applications remains uncertain due to the reliance on pre-defined rules. This paper presents a comprehensive evaluation of state-of-the-art FL and repair techniques, examining their advantages and limitations. Moreover, we introduce a novel approach that harnesses the power of Large Language Models (LLMs) in localising and repairing DL faults. Our evaluation, conducted on a carefully designed benchmark, reveals the strengths and weaknesses of current FL and repair techniques. We emphasise the importance of enhanced accuracy and the need for more rigorous assessment methods that employ multiple ground truth patches. Notably, LLMs exhibit remarkable performance in both FL and repair tasks. For instance, the GPT-4 model achieves 44% and 82% improvements in FL and repair tasks respectively, compared to the second-best tool, demonstrating the potential of LLMs in this domain. Our study sheds light on the current state of FL and repair techniques and suggests that LLMs could be a promising avenue for future advancements.

Problem

Research questions and friction points this paper is trying to address.

Evaluating effectiveness of fault localisation and repair in DL models

Proposing LLM-based approach for DL fault localisation and repair

Assessing performance gaps in current FL techniques using benchmarks

Innovation

Methods, ideas, or system contributions that make the work stand out.

Utilizes Large Language Models for fault localization

Evaluates state-of-the-art techniques comprehensively

Demonstrates GPT-4's superior performance improvements

🔎 Similar Papers

A Systematic Literature Review on Large Language Models for Automated Program Repair