Engineering RAG Systems for Real-World Applications: Design, Development, and Evaluation

📅 2025-06-25
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing RAG research lacks empirically grounded, user-in-the-loop development and systematic evaluation in authentic deployment scenarios. Method: We design and implement an end-to-end multilingual RAG system targeting five high-impact domains—governance, cybersecurity, agriculture, industrial research, and medical diagnosis—integrating multilingual OCR, vector-based semantic retrieval, and domain-adapted large language models, with flexible on-premise or cloud deployment. Contribution/Results: For the first time, we conduct a large-scale online evaluation involving 100 real users across six dimensions—usability, relevance, transparency, response latency, factual accuracy, and recommendation intent—yielding robust empirical validation. We further distill 12 actionable best practices spanning technical implementation, operational maintenance, and ethical considerations. This work bridges a critical gap in user-centered, reproducible RAG evaluation across diverse real-world settings, significantly improving factual accuracy and contextual relevance, thereby advancing reliable, deployable RAG systems.

Technology Category

Application Category

📝 Abstract
Retrieval-Augmented Generation (RAG) systems are emerging as a key approach for grounding Large Language Models (LLMs) in external knowledge, addressing limitations in factual accuracy and contextual relevance. However, there is a lack of empirical studies that report on the development of RAG-based implementations grounded in real-world use cases, evaluated through general user involvement, and accompanied by systematic documentation of lessons learned. This paper presents five domain-specific RAG applications developed for real-world scenarios across governance, cybersecurity, agriculture, industrial research, and medical diagnostics. Each system incorporates multilingual OCR, semantic retrieval via vector embeddings, and domain-adapted LLMs, deployed through local servers or cloud APIs to meet distinct user needs. A web-based evaluation involving a total of 100 participants assessed the systems across six dimensions: (i) Ease of Use, (ii) Relevance, (iii) Transparency, (iv) Responsiveness, (v) Accuracy, and (vi) Likelihood of Recommendation. Based on user feedback and our development experience, we documented twelve key lessons learned, highlighting technical, operational, and ethical challenges affecting the reliability and usability of RAG systems in practice.
Problem

Research questions and friction points this paper is trying to address.

Lack of empirical studies on real-world RAG system development
Need for evaluating RAG systems across multiple user-centric dimensions
Addressing technical, operational, and ethical challenges in RAG implementations
Innovation

Methods, ideas, or system contributions that make the work stand out.

Multilingual OCR for text extraction
Semantic retrieval via vector embeddings
Domain-adapted LLMs for specific needs
🔎 Similar Papers
No similar papers found.