LinkAnchor: An Autonomous LLM-Based Agent for Issue-to-Commit Link Recovery

📅 2025-08-17

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This paper addresses the challenge of recovering links between issue reports and code commits in software development. We propose LinkAnchor, the first autonomous agent based on large language models (LLMs) for this task. To overcome LLM context-length limitations and the inefficiency of exhaustive matching, LinkAnchor employs a lazy-loading architecture that dynamically retrieves essential contextual information and actively identifies target commits. Our method integrates context-aware retrieval, dynamic data loading, and an efficient candidate commit filtering mechanism, enabling precise traceability across extended project histories. Evaluated on multiple real-world projects, LinkAnchor achieves a Hit@1 improvement of 60%–262% over state-of-the-art approaches. The complete toolchain—including source code, evaluation scripts, and reproduction packages—is open-sourced and natively supports GitHub and Jira platforms.

Technology Category

Application Category

📝 Abstract

Issue-to-commit link recovery plays an important role in software traceability and improves project management. However, it remains a challenging task. A study on GitHub shows that only 42.2% of the issues are correctly linked to their commits. This highlights the potential for further development and research in this area. Existing studies have employed various AI/ML-based approaches, and with the recent development of large language models, researchers have leveraged LLMs to tackle this problem. These approaches suffer from two main issues. First, LLMs are constrained by limited context windows and cannot ingest all of the available data sources, such as long commit histories, extensive issue comments, and large code repositories. Second, most methods operate on individual issue-commit pairs; that is, given a single issue-commit pair, they determine whether the commit resolves the issue. This quickly becomes impractical in real-world repositories containing tens of thousands of commits. To address these limitations, we present LinkAnchor, the first autonomous LLM-based agent designed for issue-to-commit link recovery. The lazy-access architecture of LinkAnchor enables the underlying LLM to access the rich context of software, spanning commits, issue comments, and code files, without exceeding the token limit by dynamically retrieving only the most relevant contextual data. Additionally, LinkAnchor is able to automatically pinpoint the target commit rather than exhaustively scoring every possible candidate. Our evaluations show that LinkAnchor outperforms state-of-the-art issue-to-commit link recovery approaches by 60-262% in Hit@1 score across all our case study projects. We also publicly release LinkAnchor as a ready-to-use tool, along with our replication package. LinkAnchor is designed and tested for GitHub and Jira, and is easily extendable to other platforms.

Problem

Research questions and friction points this paper is trying to address.

Recovering issue-to-commit links in software projects

Overcoming LLM context limits for large repositories

Automating commit identification without exhaustive scoring

Innovation

Methods, ideas, or system contributions that make the work stand out.

Autonomous LLM agent for link recovery

Lazy-access architecture avoids token limits

Dynamic retrieval of relevant contextual data

🔎 Similar Papers

An Empirical Evaluation of Pre-trained Large Language Models for Repairing Declarative Formal Specifications