Towards Supporting Quality Architecture Evaluation with LLM Tools

📅 2026-03-30
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the longstanding reliance on manual, time-consuming, and subjectivity-prone approaches in software architecture quality assessment, particularly in scenarios involving trade-offs among multiple quality attributes. To overcome these limitations, the study introduces generative large language models (LLMs) into this domain for the first time, leveraging Microsoft Copilot in conjunction with the Architecture Tradeoff Analysis Method (ATAM) and quality attribute scenario techniques. The proposed approach enables automated identification of architectural risks, analysis of sensitivity points, and generation of trade-off recommendations. Experimental results demonstrate that, in most cases, the method achieves higher accuracy and efficiency compared to human-led reviews, substantially reducing evaluation costs while improving consistency across assessments.

Technology Category

Application Category

📝 Abstract
Architecture evaluation methods have been extensively used to evaluate software designs. Several evaluation methods have been proposed to analyze tradeoffs between different quality attributes. Also, having competing qualities leads to conflicts when selecting which quality-attribute scenarios are the most suitable ones for an architecture to tackle. Consequently, the scenarios required by the stakeholders must be prioritized and also analyzed for potential risks. Today, architecture quality evaluation is still carried out manually, often involving long brainstorming sessions to decide on the most adequate quality-attribute scenarios for the architecture. To reduce this effort and make the assessment and selection of scenarios more efficient, in this research we propose the use of LLMs to partially automate the evaluation activities. As a first step in validating this hypothesis, this paper investigates MS Copilot as an LLM tool to analyze quality-attribute scenarios suggested by students and reviewed by experienced architects. Specifically, our study compares the results of an Architecture Tradeoff Analysis Method (ATAM) exercise conducted in a software architecture course with the results of experienced software architects and with the output produced by the LLM tool. Our initial findings reveal that the LLM produces in most cases better and more accurate results regarding risks, sensitivity points and tradeoff analysis of the quality scenarios generated manually, as well as it significantly reduces the effort required for the task. Thus, we argue that the use of generative AI has the potential to partially automate and support architecture evaluation tasks by suggesting more qualitative scenarios to be evaluated and recommending the most suitable ones for a given context.
Problem

Research questions and friction points this paper is trying to address.

architecture evaluation
quality attributes
scenario prioritization
risk analysis
tradeoff analysis
Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models
Architecture Evaluation
ATAM
Quality Attributes
Generative AI
🔎 Similar Papers
No similar papers found.
Rafael Capilla
Rafael Capilla
Professor of Software Engineering, Universidad Rey Juan Carlos
Software architecturesoftware variabilitysoftware sustainabilityI4.0LLMs applied to SE
J
Jorge Andrés Díaz-Pace
Universidad del Centro de la Provincia de Buenos Aires
Y
Yamid Ramírez
Rey Juan Carlos University
J
Jennifer Pérez
Universidad Politécnica de Madrid
V
Vanessa Rodríguez-Horcajo
Universidad Politécnica de Madrid