🤖 AI Summary
To address the limited performance of text-similarity-based trace link recovery (TLR) caused by the semantic gap between natural language (NL) and programming language (PL) artifacts, this paper proposes a cross-modal association method integrating multiple domain-specific auxiliary strategies. We systematically design a synergistic framework that unifies edge-type modeling, context enhancement, and prompt engineering—each tailored to augment heterogeneous graph transformer (HGT) and large language models (LLMs), specifically Gemini 2.5 Pro. Experimental evaluation across 12 open-source projects demonstrates that our multi-strategy HGT and Gemini 2.5 Pro achieve average F1-score improvements of 3.68% and 8.84% over respective baselines, significantly outperforming the current state-of-the-art HGNNLink. These results empirically validate the effectiveness of strategic synergy in bridging the NL-PL semantic gap for TLR.
📝 Abstract
In the field of software traceability link recovery (TLR), textual similarity has long been regarded as the core criterion. However, in tasks involving natural language and programming language (NL-PL) artifacts, relying solely on textual similarity is limited by their semantic gap. To this end, we conducted a large-scale empirical evaluation across various types of TLR tasks, revealing the limitations of textual similarity in NL-PL scenarios. To address these limitations, we propose an approach that incorporates multiple domain-specific auxiliary strategies, identified through empirical analysis, into two models: the Heterogeneous Graph Transformer (HGT) via edge types and the prompt-based Gemini 2.5 Pro via additional input information. We then evaluated our approach using the widely studied requirements-to-code TLR task, a representative case of NL-PL TLR. Experimental results show that both the multi-strategy HGT and Gemini 2.5 Pro models outperformed their original counterparts without strategy integration. Furthermore, compared to the current state-of-the-art method HGNNLink, the multi-strategy HGT and Gemini 2.5 Pro models achieved average F1-score improvements of 3.68% and 8.84%, respectively, across twelve open-source projects, demonstrating the effectiveness of multi-strategy integration in enhancing overall model performance for the requirements-code TLR task.