π€ AI Summary
This work addresses the limited effectiveness of existing SZZ approaches, which rely on git blame and identify only about 60% of bug-introducing commits (BICs) due to their restriction to commits directly modifying the fixed lines. To overcome this, we reformulate BIC identification as a search problem over a temporal knowledge graph that captures both chronological and structural relationships among commitsβa novel integration into software evolution analysis. We further introduce a large language model agent to perform causal reasoning and guide candidate exploration within this graph. This approach transcends the narrow search space of traditional SZZ methods, establishing a new paradigm of graph-based causal inference. Evaluated on three datasets, our method achieves F1 scores ranging from 0.48 to 0.74, yielding up to a 27% improvement over state-of-the-art techniques, thereby demonstrating the efficacy of knowledge graph expansion and agent-augmented reasoning.
π Abstract
Identifying Bug-Inducing Commits (BICs) is fundamental for understanding software defects and enabling downstream tasks such as defect prediction and automated program repair. Yet existing SZZ-based approaches are limited by their reliance on git blame, which restricts the search space to commits that directly modified the fixed lines. Our preliminary study on 2,102 validated bug-fixing commits reveals that this limitation is significant: over 40% of cases cannot be solved by blame alone, as 28% of BICs require traversing commit history beyond blame results and 14% are blameless. We present AgenticSZZ, the first approach to apply Temporal Knowledge Graphs (TKGs) to software evolution analysis. AgenticSZZ reframes BIC identification from a ranking problem over blame commits into a graph search problem, where temporal ordering is fundamental to causal reasoning about bug introduction. The approach operates in two phases: (1) constructing a TKG that encodes commits with temporal and structural relationships, expanding the search space by traversing file history backward from two reference points (blame commits and the BFC); and (2) leveraging an LLM agent to navigate the graph using specialized tools for candidate exploration and causal analysis. Evaluation on three datasets shows that AgenticSZZ achieves F1-scores of 0.48 to 0.74, with statistically significant improvements over state-of-the-art by up to 27%. Our ablation study confirms that both components are essential, reflecting a classic exploration-exploitation trade-off: the TKG expands the search space while the agent provides intelligent selection. By transforming BIC identification into a graph search problem, we open a new research direction for temporal and causal reasoning in software evolution analysis.