🤖 AI Summary
Integrating large language models (LLMs) with retrieval-augmented generation (RAG) faces practical challenges—including interface mismatches, heterogeneous requirements, and complex system management—leading to functional failures, performance degradation, and security vulnerabilities. To address this, we conduct an empirical audit of 100 open-source RAG-enhanced LLM applications, establishing the first comprehensive defect taxonomy spanning functionality, efficiency, and security dimensions; we identify 18 representative defect classes and find that 77% of applications exhibit ≥3 cross-dimensional defects. Leveraging defect pattern mining and RAG architectural analysis, we propose a systematic, lifecycle-aware remediation framework covering development, deployment, and operations. We further release Hydrangea—a high-quality, open-source defect knowledge base. This work delivers the first reusable, diagnosable, and governable specification and practical foundation for systematic LLM–RAG integration in industry.
📝 Abstract
Large language models (LLMs) provide effective solutions in various application scenarios, with the support of retrieval-augmented generation (RAG). However, developers face challenges in integrating LLM and RAG into software systems, due to lacking interface specifications, various requirements from software context, and complicated system management. In this paper, we have conducted a comprehensive study of 100 open-source applications that incorporate LLMs with RAG support, and identified 18 defect patterns. Our study reveals that 77% of these applications contain more than three types of integration defects that degrade software functionality, efficiency, and security. Guided by our study, we propose systematic guidelines for resolving these defects in software life cycle. We also construct an open-source defect library Hydrangea.