Towards Supporting Quality Architecture Evaluation with LLM Tools

📅 2026-03-30

📈 Citations: 0

✨ Influential: 0

🤖 AI Summary

This work addresses the longstanding reliance on manual, time-consuming, and subjectivity-prone approaches in software architecture quality assessment, particularly in scenarios involving trade-offs among multiple quality attributes. To overcome these limitations, the study introduces generative large language models (LLMs) into this domain for the first time, leveraging Microsoft Copilot in conjunction with the Architecture Tradeoff Analysis Method (ATAM) and quality attribute scenario techniques. The proposed approach enables automated identification of architectural risks, analysis of sensitivity points, and generation of trade-off recommendations. Experimental results demonstrate that, in most cases, the method achieves higher accuracy and efficiency compared to human-led reviews, substantially reducing evaluation costs while improving consistency across assessments.

Technology Category

Application Category

📝 Abstract

Architecture evaluation methods have been extensively used to evaluate software designs. Several evaluation methods have been proposed to analyze tradeoffs between different quality attributes. Also, having competing qualities leads to conflicts when selecting which quality-attribute scenarios are the most suitable ones for an architecture to tackle. Consequently, the scenarios required by the stakeholders must be prioritized and also analyzed for potential risks. Today, architecture quality evaluation is still carried out manually, often involving long brainstorming sessions to decide on the most adequate quality-attribute scenarios for the architecture. To reduce this effort and make the assessment and selection of scenarios more efficient, in this research we propose the use of LLMs to partially automate the evaluation activities. As a first step in validating this hypothesis, this paper investigates MS Copilot as an LLM tool to analyze quality-attribute scenarios suggested by students and reviewed by experienced architects. Specifically, our study compares the results of an Architecture Tradeoff Analysis Method (ATAM) exercise conducted in a software architecture course with the results of experienced software architects and with the output produced by the LLM tool. Our initial findings reveal that the LLM produces in most cases better and more accurate results regarding risks, sensitivity points and tradeoff analysis of the quality scenarios generated manually, as well as it significantly reduces the effort required for the task. Thus, we argue that the use of generative AI has the potential to partially automate and support architecture evaluation tasks by suggesting more qualitative scenarios to be evaluated and recommending the most suitable ones for a given context.

Problem

Research questions and friction points this paper is trying to address.

architecture evaluation

quality attributes

scenario prioritization

risk analysis

tradeoff analysis

Innovation

Methods, ideas, or system contributions that make the work stand out.

Large Language Models

Architecture Evaluation

ATAM

Quality Attributes

Generative AI

🔎 Similar Papers

No similar papers found.

Authors to Follow