AgentSZZ: Teaching the LLM Agent to Play Detective with Bug-Inducing Commits

📅 2026-04-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of traditional SZZ algorithms, which rely on git blame and struggle to accurately identify defect-introducing commits in scenarios involving ghost commits and cross-file changes, while lacking developer-like reasoning capabilities. To overcome these challenges, we propose AgentSZZ, the first approach to integrate large language model (LLM) agents into the SZZ task. AgentSZZ leverages a ReAct reasoning framework, a task-specific toolset, and a structured context compression mechanism to enable adaptive causal tracing. The method supports multi-tool coordination and iterative exploration, preserving critical evidence while substantially reducing contextual redundancy. Experimental results demonstrate that AgentSZZ achieves up to a 27.2% improvement in F1 score across three widely used datasets, with recall gains of 300% and 60% in cross-file and ghost commit scenarios, respectively, alongside a reduction of over 30% in token consumption.
📝 Abstract
The SZZ algorithm is the dominant technique for identifying bug-inducing commits and underpins many software engineering tasks, such as defect prediction and vulnerability analysis. Despite numerous variants, including recent LLM-based approaches, performance remains limited on developer-annotated datasets (e.g., recall of 0.552 on the Linux kernel). A key limitation is the reliance on git blame, which traces line-level changes within the same file, failing in common scenarios such as ghost and cross-file cases-making nearly one-quarter of bug-inducing commits inherently untraceable. Moreover, current approaches follow fixed pipelines that restrict iterative reasoning and exploration, unlike developers who investigate bugs through an interactive, multi-tool process. To address these challenges, we propose AgentSZZ, an agent-based framework that leverages LLM-driven agents to explore repositories and identify bug-inducing commits. Unlike prior methods, AgentSZZ integrates task-specific tools, domain knowledge, and a ReAct-style loop to enable adaptive and causal tracing of bugs. A structured compression module further improves efficiency by reducing redundant context while preserving key evidence. Extensive experiments on three widely used datasets show that AgentSZZ consistently outperforms state-of-the-art SZZ algorithms across all settings, achieving F1-score gains of up to 27.2% over prior LLM-based approaches. The improvements are especially pronounced in challenging scenarios such as cross-file and ghost commits, with recall gains of up to 300% and 60%, respectively. Ablation studies show that task-specific tools and domain knowledge are critical, while compression tool outputs reduce token consumption by over 30% with negligible impact. The replication package is available.
Problem

Research questions and friction points this paper is trying to address.

bug-inducing commits
SZZ algorithm
cross-file
ghost commits
defect prediction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Agent-based SZZ
LLM agent
bug-inducing commit identification
cross-file tracing
structured context compression
🔎 Similar Papers
No similar papers found.