Back to the Basics: Rethinking Issue-Commit Linking with LLM-Assisted Retrieval

📅 2025-07-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing issue-commit linking methods suffer from severe performance degradation in real-world repositories due to escalating noise as commit volumes grow. This work introduces a realistic distribution evaluation setting (RDS), establishing a benchmark dataset spanning 20 open-source projects to systematically expose the substantial performance decay of state-of-the-art methods under practical conditions—the first such comprehensive analysis. Under RDS, we empirically find that conventional information retrieval techniques outperform existing deep learning models. Motivated by this insight, we propose EasyLink: a lightweight, efficient framework that employs a vector database for rapid initial retrieval and leverages large language models (LLMs) for semantic alignment and re-ranking, effectively bridging the semantic gap between issues and commits. EasyLink achieves 75.91% Precision@1—over four times higher than the prior SOTA—delivering a new paradigm for industrial-scale issue-commit linking that balances high accuracy with low computational overhead.

Technology Category

Application Category

📝 Abstract
Issue-commit linking, which connects issues with commits that fix them, is crucial for software maintenance. Existing approaches have shown promise in automatically recovering these links. Evaluations of these techniques assess their ability to identify genuine links from plausible but false links. However, these evaluations overlook the fact that, in reality, when a repository has more commits, the presence of more plausible yet unrelated commits may interfere with the tool in differentiating the correct fix commits. To address this, we propose the Realistic Distribution Setting (RDS) and use it to construct a more realistic evaluation dataset that includes 20 open-source projects. By evaluating tools on this dataset, we observe that the performance of the state-of-the-art deep learning-based approach drops by more than half, while the traditional Information Retrieval method, VSM, outperforms it. Inspired by these observations, we propose EasyLink, which utilizes a vector database as a modern Information Retrieval technique. To address the long-standing problem of the semantic gap between issues and commits, EasyLink leverages a large language model to rerank the commits retrieved from the database. Under our evaluation, EasyLink achieves an average Precision@1 of 75.91%, improving over the state-of-the-art by over four times. Additionally, this paper provides practical guidelines for advancing research in issue-commit link recovery.
Problem

Research questions and friction points this paper is trying to address.

Improving accuracy in linking issues to fix commits
Addressing semantic gap between issues and commits
Evaluating tools with realistic commit distributions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses vector database for modern IR
Leverages LLM for semantic reranking
Introduces Realistic Distribution Setting evaluation
🔎 Similar Papers
No similar papers found.
Huihui Huang
Huihui Huang
Hunan University
Organic optoelectronicsThermoelectric power2D materials and related devices
Ratnadira Widyasari
Ratnadira Widyasari
Singapore Management University
Computer science
T
Ting Zhang
School of Computing and Information Systems, Singapore Management University, Singapore
Ivana Clairine Irsan
Ivana Clairine Irsan
Singapore Management University
Artificial IntelligenceMachine Learning
Jieke Shi
Jieke Shi
PhD Candidate & Research Engineer, Singapore Management University
Software EngineeringAI Software Testing
H
Han Wei Ang
GovTech, Singapore
Frank Liauw
Frank Liauw
Lead Cybersecurity Engineer, Government Technology Agency Singapore
E
Eng Lieh Ouh
School of Computing and Information Systems, Singapore Management University, Singapore
L
Lwin Khin Shar
School of Computing and Information Systems, Singapore Management University, Singapore
Hong Jin Kang
Hong Jin Kang
University of Sydney
Software EngineeringSpecification MiningActive Learning
D
David Lo
School of Computing and Information Systems, Singapore Management University, Singapore