Enhancing Requirements Traceability Link Recovery: A Novel Approach with T-SimCSE

📅 2026-03-12
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the limitations of existing automated requirement traceability link recovery approaches, which typically rely on substantial labeled data and achieve only modest accuracy—conditions rarely met in real-world scenarios where labeled data are scarce. To overcome this challenge, the paper proposes T-SimCSE, a novel method that leverages the unsupervised pre-trained language model SimCSE to compute semantic similarity between requirements and target artifacts. It further introduces a new specificity metric to re-rank candidate links, enabling the generation of high-precision top-K traceability links without any labeled data. Experimental results across ten public datasets demonstrate that T-SimCSE significantly outperforms state-of-the-art methods in both recall and mean average precision (MAP).

Technology Category

Application Category

📝 Abstract
Requirements traceability plays an important role in ensuring software quality and responding to changes in requirements. Requirements trace links (such as the links between requirements and other software artifacts) underpin the modeling and implementation of requirements traceability. With the rapid development of artificial intelligence, more and more pre-trained language models (PLMs) techniques are applied to the automatic recovery of requirements trace links. However, the requirements traceability links recovered by these approaches are not accurate enough, and many approaches require a large labeled dataset for training. Currently, there are very few labeled datasets available. To address these limitations, this paper proposes a novel requirements traceability link recovery approach called T-SimCSE, which is based on a PLM -- SimCSE. SimCSE has the advantages of not requiring labeled data, having broad applicability, and performing well. T-SimCSE firstly uses the SimCSE model to calculate the similarity between requirements and target artifacts, and employs a new metric (i.e. specificity) to reorder those target artifacts. Finally, the trace links are created between the requirement and the top-K target artifacts. We have evaluated T-SimCSE on ten public datasets by comparing them with other approaches. The results show that T-SimCSE achieves superior performance in terms of recall and Mean Average Precision (MAP).
Problem

Research questions and friction points this paper is trying to address.

requirements traceability
trace link recovery
pre-trained language models
labeled dataset scarcity
software artifacts
Innovation

Methods, ideas, or system contributions that make the work stand out.

T-SimCSE
requirements traceability
SimCSE
unsupervised learning
specificity metric
🔎 Similar Papers
No similar papers found.
Y
Ye Wang
School of Computer Science and Technology, Zhejiang Gongshang University, ZheJiang, HangZhou, 310018, China; Zhejiang Key Laboratory of Big Data and Future E-Commerce Technology, ZheJiang, China
Wenqing Wang
Wenqing Wang
Postdoctoral researcher, Iowa State University
Dynamic ModelingHierarchical ControlModel Predictive ControlStochastic Control
K
Kun Hu
School of Computer Science and Technology, Zhejiang Gongshang University, ZheJiang, HangZhou, 310018, China; Zhejiang Key Laboratory of Big Data and Future E-Commerce Technology, ZheJiang, China
Qiao Huang
Qiao Huang
Zhejiang Gongshang University
Software EngineeringMining Software Repositories
Liping Zhao
Liping Zhao
Department of Computer Science, University of Manchester
Requirements EngineeringNatural Language ProcessingSoftware Engineering