Who's Who? LLM-assisted Software Traceability with Architecture Entity Recognition

📅 2025-11-04

📈 Citations: 0

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Traditional trace link recovery (TLR) between Software Architecture Documentation (SAD) and source code relies heavily on manual architectural modeling, resulting in low efficiency and scalability. Method: This paper proposes two unsupervised, LLM-driven approaches: ExArch, which automatically extracts architecture component names from SAD and source code without predefined architectural models (F1 = 0.86); and ArTEMiS, which performs fine-grained semantic matching to align documentary entities with architectural model elements (F1 = 0.81). Contribution/Results: Their synergistic integration significantly outperforms existing unsupervised baselines (e.g., ArDoCode) and approaches the performance of the supervised, manually modeled TransArC (F1 = 0.87). By integrating prompt engineering, pattern extraction, and LLM capabilities—specifically named entity recognition and semantic alignment—the framework achieves high-precision, model-free architectural traceability recovery for the first time, substantially advancing the automation and practical applicability of TLR in industrial settings.

Technology Category

Application Category

📝 Abstract

Identifying architecturally relevant entities in textual artifacts is crucial for Traceability Link Recovery (TLR) between Software Architecture Documentation (SAD) and source code. While Software Architecture Models (SAMs) can bridge the semantic gap between these artifacts, their manual creation is time-consuming. Large Language Models (LLMs) offer new capabilities for extracting architectural entities from SAD and source code to construct SAMs automatically or establish direct trace links. This paper presents two LLM-based approaches: ExArch extracts component names as simple SAMs from SAD and source code to eliminate the need for manual SAM creation, while ArTEMiS identifies architectural entities in documentation and matches them with (manually or automatically generated) SAM entities. Our evaluation compares against state-of-the-art approaches SWATTR, TransArC and ArDoCode. TransArC achieves strong performance (F1: 0.87) but requires manually created SAMs; ExArch achieves comparable results (F1: 0.86) using only SAD and code. ArTEMiS is on par with the traditional heuristic-based SWATTR (F1: 0.81) and can successfully replace it when integrated with TransArC. The combination of ArTEMiS and ExArch outperforms ArDoCode, the best baseline without manual SAMs. Our results demonstrate that LLMs can effectively identify architectural entities in textual artifacts, enabling automated SAM generation and TLR, making architecture-code traceability more practical and accessible.

Problem

Research questions and friction points this paper is trying to address.

Automating Software Architecture Model creation from documentation and code

Identifying architecturally relevant entities in textual artifacts automatically

Establishing traceability links between architecture documentation and source code

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs extract architectural entities from documentation and code

ExArch generates simple SAMs automatically without manual creation

ArTEMiS matches architectural entities with SAM entities automatically

🔎 Similar Papers

No similar papers found.