🤖 AI Summary
In open-source projects, traceability links between release notes and development artifacts (e.g., pull requests, commits, issues) are frequently missing or erroneous due to remote, asynchronous collaboration—exacerbating technical debt and undermining maintainability. To address this, we propose an automated traceability linking method that jointly leverages a large language model (Gemini 1.5 Pro), semantic text alignment, and temporal proximity features. We further construct the first high-quality, open-source benchmark dataset for release-note traceability, comprising 3,500 manually annotated instances. Experimental evaluation shows our approach achieves a Precision@1 of 0.73 on the pull-request tracing task. A survey with 33 practitioners reveals that 84% strongly endorse its necessity. This work bridges a critical gap in both the research and practice of automated traceability linking in open-source software development.
📝 Abstract
Maintaining traceability links between software release notes and corresponding development artifacts, e.g., pull requests (PRs), commits, and issues, is essential for managing technical debt and ensuring maintainability. However, in open-source environments where contributors work remotely and asynchronously, establishing and maintaining these links is often error-prone, time-consuming, and frequently overlooked. Our empirical study of GitHub repositories revealed that 47% of release artifacts lacked traceability links, and 12% contained broken links. To address this gap, we first analyzed release notes to identify their What, Why, and How information and assessed how these align with PRs, commits, and issues. We curated a benchmark dataset consisting of 3,500 filtered and validated traceability link instances. Then, we implemented LLM-based approaches to automatically establish traceability links of three pairs between release note contents & PRs, release note contents & PRs and release note contents & issues. By combining the time proximity feature, the LLM-based approach, e.g., Gemini 1.5 Pro, achieved a high Precision@1 value of 0.73 for PR traceability recovery. To evaluate the usability and adoption potential of this approach, we conducted an online survey involving 33 open-source practitioners. 16% of respondents rated as very important, and 68% as somewhat important for traceability maintenance.