🤖 AI Summary
Accurately pinpointing the code commits that introduce software defects has long been hindered by a performance ceiling in precision. This work proposes a novel workflow leveraging large language model (LLM) agents to automatically generate concise, grep-friendly search patterns by analyzing fix-commits, thereby enabling efficient identification of defect-introducing commits within candidate sets. For the first time, this approach demonstrates that LLM agents can effectively distill high-precision code change patterns, substantially advancing localization accuracy. Evaluated on mainstream Linux kernel datasets, the method achieves an F1-score of 0.81, a significant improvement over the previous state-of-the-art score of 0.64—surpassing the cumulative gains of all prior approaches over the past two decades and breaking through a longstanding performance bottleneck in the field.
📝 Abstract
Śliwerski, Zimmermann, and Zeller (SZZ) just won the 2026 ACM SIGSOFT Impact Award for asking:
When do changes induce fixes?
Their paper from 2005 served as the foundation for a wide array of approaches aimed at identifying bug-introducing changes (or commits) from fix commits in software repositories. But even after two decades of progress, the best-performing approach from 2025 yields a modest increase of 10 percentage points in F1-score on the most popular Linux kernel dataset.
In this paper, we uncover how and why LLM-based agents can substantially advance the state-of-the-art in identifying bug-introducing commits from fix commits. We propose a simple agentic workflow based on searching a set of candidate commits and find that it raises the F1-score from 0.64 to 0.81 on the most popular Linux kernel dataset, a bigger jump than between the original 2005 method (0.54) and the previous SOTA (0.64). We also uncover why agents are so successful: They derive short greppable patterns from the fix commit diff and message and use them to effectively search and find bug-introducing commits in large candidate sets. Finally, we also discuss how these insights might enable further progress in bug detection, root cause understanding, and repair.