🤖 AI Summary
Existing tools struggle to identify microservice infrastructure architecture patterns across programming languages, primarily due to their reliance on single-language support and the inherent diversity of architectural patterns. This work proposes MicroPAD, the first systematic approach to evaluate the capability of large language models in detecting microservice architecture patterns across multiple languages and heterogeneous software artifacts. Leveraging GPT-5 nano and natural language pattern descriptions, MicroPAD operates without dependence on any specific programming language, enabling pattern recognition in arbitrary codebases. We introduce a manually annotated dataset comprising 190 repositories and demonstrate that MicroPAD effectively detects a range of microservice patterns, achieving F1 scores between 0.09 and 0.70. Our analysis further reveals that pattern salience and artifact clarity are key factors influencing detection performance.
📝 Abstract
Architectural patterns are frequently found in various software artifacts. The wide variety of patterns and their implementations makes detection challenging with current tools, especially since they often only support detecting patterns in artifacts written in a single language. Large Language Models (LLMs), trained on a diverse range of software artifacts and knowledge, might overcome the limitations of existing approaches. However, their true effectiveness and the factors influencing their performance have not yet been thoroughly examined. To better understand this, we developed MicroPAD. This tool utilizes GPT 5 nano to identify architectural patterns in software artifacts written in any language, based on natural-language pattern descriptions. We used MicroPAD to evaluate an LLM's ability to detect instances of architectural patterns, particularly infrastructure-related microservice patterns. To accomplish this, we selected a set of GitHub repositories and contacted their top contributors to create a new, human-annotated dataset of 190 repositories containing microservice architectural patterns. The results show that MicroPAD was capable of detecting pattern instances across multiple languages and artifact types. The detection performance varied across patterns (F1 scores ranging from 0.09 to 0.70), specifically in relation to their prevalence and the distinctiveness of the artifacts through which they manifest. We also found that patterns associated with recognizable, dominant artifacts were detected more reliably. Whether these findings generalize to other LLMs and tools is a promising direction for future research.