AgentArcEval: An Architecture Evaluation Method for Foundation Model based Agents

📅 2025-10-23

📈 Citations: 0

✨ Influential: 0

career value

222K/year

🤖 AI Summary

Traditional evaluation methods struggle to accommodate the composite architecture, autonomy, non-determinism, and continuous evolution inherent in foundation model (FM)-based agents. To address this, we propose AgentArcEval—the first systematic evaluation framework explicitly designed for FM agent architectural characteristics. Our approach comprises three key contributions: (1) formal definition and modeling of core architectural features of FM agents; (2) construction of a reusable, extensible taxonomy of agent-specific scenarios, enabling quality attribute assessment for dynamically evolving systems; and (3) integration of an extended Architecture Tradeoff Analysis Method (ATAM), scenario-driven evaluation workflows, and qualitative/semi-quantitative scoring to unify multidimensional quality metrics. Empirical validation on Luna—a real-world tax-assistant FM agent—demonstrates that AgentArcEval effectively identifies architectural bottlenecks and significantly improves system reliability and maintainability.

Technology Category

Application Category

📝 Abstract

The emergence of foundation models (FMs) has enabled the development of highly capable and autonomous agents, unlocking new application opportunities across a wide range of domains. Evaluating the architecture of agents is particularly important as the architectural decisions significantly impact the quality attributes of agents given their unique characteristics, including compound architecture, autonomous and non-deterministic behaviour, and continuous evolution. However, these traditional methods fall short in addressing the evaluation needs of agent architecture due to the unique characteristics of these agents. Therefore, in this paper, we present AgentArcEval, a novel agent architecture evaluation method designed specially to address the complexities of FM-based agent architecture and its evaluation. Moreover, we present a catalogue of agent-specific general scenarios, which serves as a guide for generating concrete scenarios to design and evaluate the agent architecture. We demonstrate the usefulness of AgentArcEval and the catalogue through a case study on the architecture evaluation of a real-world tax copilot, named Luna.

Problem

Research questions and friction points this paper is trying to address.

Evaluating architecture of foundation model based agents

Addressing unique agent characteristics in evaluation

Providing agent-specific scenarios for architecture assessment

Innovation

Methods, ideas, or system contributions that make the work stand out.

Proposes AgentArcEval method for FM-based agent evaluation

Introduces agent-specific general scenarios catalogue for design guidance

Validates approach through real-world tax copilot case study

🔎 Similar Papers

Swiss Cheese Model for AI Safety: A Taxonomy and Reference Architecture for Multi-Layered Guardrails of Foundation Model Based Agents