Are LLMs Correctly Integrated into Software Systems?

📅 2024-07-06

📈 Citations: 0

✨ Influential: 0

career value

160K/year

🤖 AI Summary

Integrating large language models (LLMs) with retrieval-augmented generation (RAG) faces practical challenges—including interface mismatches, heterogeneous requirements, and complex system management—leading to functional failures, performance degradation, and security vulnerabilities. To address this, we conduct an empirical audit of 100 open-source RAG-enhanced LLM applications, establishing the first comprehensive defect taxonomy spanning functionality, efficiency, and security dimensions; we identify 18 representative defect classes and find that 77% of applications exhibit ≥3 cross-dimensional defects. Leveraging defect pattern mining and RAG architectural analysis, we propose a systematic, lifecycle-aware remediation framework covering development, deployment, and operations. We further release Hydrangea—a high-quality, open-source defect knowledge base. This work delivers the first reusable, diagnosable, and governable specification and practical foundation for systematic LLM–RAG integration in industry.

Technology Category

Application Category

📝 Abstract

Large language models (LLMs) provide effective solutions in various application scenarios, with the support of retrieval-augmented generation (RAG). However, developers face challenges in integrating LLM and RAG into software systems, due to lacking interface specifications, various requirements from software context, and complicated system management. In this paper, we have conducted a comprehensive study of 100 open-source applications that incorporate LLMs with RAG support, and identified 18 defect patterns. Our study reveals that 77% of these applications contain more than three types of integration defects that degrade software functionality, efficiency, and security. Guided by our study, we propose systematic guidelines for resolving these defects in software life cycle. We also construct an open-source defect library Hydrangea.

Problem

Research questions and friction points this paper is trying to address.

Challenges in LLM and RAG integration

Identification of 18 defect patterns

Guidelines for resolving integration defects

Innovation

Methods, ideas, or system contributions that make the work stand out.

LLMs integrated with RAG

Identified 18 defect patterns

Proposed systematic guidelines

🔎 Similar Papers

No similar papers found.