🤖 AI Summary
This study addresses the underutilization of Architecture Decision Records (ADRs) due to their high authoring cost and investigates the effectiveness of large language models (LLMs) in automatically generating ADRs. The authors systematically evaluate five context incorporation strategies—no context, full history, preceding K, succeeding K, and RAFG—and find that context engineering exerts a far greater impact on generation quality than model scale. Notably, using only the three to five most recent ADRs achieves optimal cost-effectiveness. For complex scenarios, a retrieval fallback mechanism is proposed. Results demonstrate that context-aware prompting significantly enhances generation fidelity, with a small window of recent context yielding the best performance; retrieval-based strategies provide only marginal gains, primarily in nonlinear or cross-cutting decisions.
📝 Abstract
Architecture Decision Records (ADRs) play a critical role in preserving the rationale behind system design, yet their creation and maintenance are often neglected due to the associated authoring overhead. This paper investigates whether Large Language Models (LLMs) can mitigate this burden and, more importantly, how different strategies for presenting historical ADRs as context influence generation quality. We curate and validate a large corpus of sequential ADRs drawn from 750 open-source repositories and systematically evaluate five context selection strategies (no context, All-history, First-K, Last-K, and RAFG) across multiple model families. Our results show that context-aware prompting substantially improves ADR generation fidelity, with a small recency window (typically 3-5 prior records) providing the best balance between quality and efficiency. Retrieval-based context selection yields marginal gains primarily in non-sequential or cross-cutting decision scenarios, while offering no statistically significant advantage in typical linear ADR workflows. Overall, our findings demonstrate that context engineering, rather than model scale alone, is the dominant factor in effective ADR automation, and we outline practical defaults for tool builders along with targeted retrieval fallbacks for complex architectural settings.