Towards Effectively Leveraging Execution Traces for Program Repair with Code LLMs

📅 2025-05-07

📈 Citations: 0

✨ Influential: 0

career value

143K/year

🤖 AI Summary

Current automated program repair (APR) methods over-rely on static analysis while neglecting dynamic runtime behavior, limiting their ability to guide large language models (LLMs) toward accurate fixes. Method: This paper presents the first systematic investigation into how program execution traces enhance LLM-based repair. We propose a trajectory-injection prompting strategy that structurally incorporates dynamic execution information into LLM inputs while maintaining可控 computational complexity. Contribution/Results: Extensive evaluation across six dataset–model combinations—including controlled ablation studies and probing analyses—demonstrates the efficacy boundary of execution traces: significant accuracy improvements in two configurations; consistent superiority over trajectory-free baselines and lightweight fine-tuning approaches. Our core contribution is establishing execution traces as a novel, effective signal for LLM-based program understanding, thereby enabling a scalable, dynamically aware APR paradigm.

Technology Category

Application Category

📝 Abstract

Large Language Models (LLMs) show promising performance on various programming tasks, including Automatic Program Repair (APR). However, most approaches to LLM-based APR are limited to the static analysis of the programs, while disregarding their runtime behavior. Inspired by knowledge-augmented NLP, in this work, we aim to remedy this potential blind spot by augmenting standard APR prompts with program execution traces. We evaluate our approach using the GPT family of models on three popular APR datasets. Our findings suggest that simply incorporating execution traces into the prompt provides a limited performance improvement over trace-free baselines, in only 2 out of 6 tested dataset / model configurations. We further find that the effectiveness of execution traces for APR diminishes as their complexity increases. We explore several strategies for leveraging traces in prompts and demonstrate that LLM-optimized prompts help outperform trace-free prompts more consistently. Additionally, we show trace-based prompting to be superior to finetuning a smaller LLM on a small-scale dataset; and conduct probing studies reinforcing the notion that execution traces can complement the reasoning abilities of the LLMs.

Problem

Research questions and friction points this paper is trying to address.

Enhancing program repair by incorporating execution traces

Evaluating effectiveness of traces in LLM-based program repair

Optimizing trace usage to improve LLM repair performance

Innovation

Methods, ideas, or system contributions that make the work stand out.

Augmenting APR prompts with execution traces

Using LLM-optimized prompts for better performance

Comparing trace-based prompting with finetuning small LLMs

🔎 Similar Papers

RepairLLaMA: Efficient Representations and Fine-Tuned Adapters for Program Repair