Natural Language-Programming Language Software Traceability Link Recovery Needs More than Textual Similarity

📅 2025-09-06
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the limited performance of text-similarity-based trace link recovery (TLR) caused by the semantic gap between natural language (NL) and programming language (PL) artifacts, this paper proposes a cross-modal association method integrating multiple domain-specific auxiliary strategies. We systematically design a synergistic framework that unifies edge-type modeling, context enhancement, and prompt engineering—each tailored to augment heterogeneous graph transformer (HGT) and large language models (LLMs), specifically Gemini 2.5 Pro. Experimental evaluation across 12 open-source projects demonstrates that our multi-strategy HGT and Gemini 2.5 Pro achieve average F1-score improvements of 3.68% and 8.84% over respective baselines, significantly outperforming the current state-of-the-art HGNNLink. These results empirically validate the effectiveness of strategic synergy in bridging the NL-PL semantic gap for TLR.

Technology Category

Application Category

📝 Abstract
In the field of software traceability link recovery (TLR), textual similarity has long been regarded as the core criterion. However, in tasks involving natural language and programming language (NL-PL) artifacts, relying solely on textual similarity is limited by their semantic gap. To this end, we conducted a large-scale empirical evaluation across various types of TLR tasks, revealing the limitations of textual similarity in NL-PL scenarios. To address these limitations, we propose an approach that incorporates multiple domain-specific auxiliary strategies, identified through empirical analysis, into two models: the Heterogeneous Graph Transformer (HGT) via edge types and the prompt-based Gemini 2.5 Pro via additional input information. We then evaluated our approach using the widely studied requirements-to-code TLR task, a representative case of NL-PL TLR. Experimental results show that both the multi-strategy HGT and Gemini 2.5 Pro models outperformed their original counterparts without strategy integration. Furthermore, compared to the current state-of-the-art method HGNNLink, the multi-strategy HGT and Gemini 2.5 Pro models achieved average F1-score improvements of 3.68% and 8.84%, respectively, across twelve open-source projects, demonstrating the effectiveness of multi-strategy integration in enhancing overall model performance for the requirements-code TLR task.
Problem

Research questions and friction points this paper is trying to address.

Addressing semantic gap in NL-PL traceability link recovery
Overcoming limitations of textual similarity in software artifacts
Enhancing requirements-to-code traceability with multi-strategy integration
Innovation

Methods, ideas, or system contributions that make the work stand out.

Heterogeneous Graph Transformer with edge types
Prompt-based Gemini 2.5 Pro with additional inputs
Multi-strategy integration for enhanced traceability recovery
🔎 Similar Papers
No similar papers found.
Z
Zhiyuan Zou
School of Computer Science and Artificial Intelligence, Wuhan Textile University, China
B
Bangchao Wang
School of Computer Science and Artificial Intelligence, Wuhan Textile University, China
Peng Liang
Peng Liang
School of Computer Science, Wuhan University
Software EngineeringSoftware ArchitectureEmpirical Software Engineering
Tingting Bi
Tingting Bi
The University of Melbourne & The University of Western Australia
Software ArchitectureSE4AIEmpirical Software EngineeringSoftware Supply Chain
H
Huan Jin
School of Computer Science and Artificial Intelligence, Wuhan Textile University, China