π€ AI Summary
To address low credibility, poor traceability, and insufficient answer diversity of Retrieval-Augmented Generation (RAG) in low-resource domain expert systems handling heterogeneous multimodal data, this paper proposes an end-to-end trustworthy RAG framework. First, it introduces a structured corpus construction and Q&A auto-generation pipeline tailored for messy, real-world data. Second, it designs a semantic- and evidence-confidence-driven two-stage re-ranking mechanism to enhance retrieval precision. Third, it pioneers an LLM-guided reference matching algorithm that ensures answer grounding in retrieved evidence and enables fully traceable generation. Experiments in automotive engineering demonstrate significant improvements over non-RAG baselines: +1.94 in factual correctness, +1.16 in informativeness, and +1.67 in helpfulness (5-point scale, evaluated by LLM judges), validating the frameworkβs effectiveness in trustworthy retrieval, interpretable generation, and auditability.
π Abstract
RAG has become a key technique for enhancing LLMs by reducing hallucinations, especially in domain expert systems where LLMs may lack sufficient inherent knowledge. However, developing these systems in low-resource settings introduces several challenges: (1) handling heterogeneous data sources, (2) optimizing retrieval phase for trustworthy answers, and (3) evaluating generated answers across diverse aspects. To address these, we introduce a data generation pipeline that transforms raw multi-modal data into structured corpus and Q&A pairs, an advanced re-ranking phase improving retrieval precision, and a reference matching algorithm enhancing answer traceability. Applied to the automotive engineering domain, our system improves factual correctness (+1.94), informativeness (+1.16), and helpfulness (+1.67) over a non-RAG baseline, based on a 1-5 scale by an LLM judge. These results highlight the effectiveness of our approach across distinct aspects, with strong answer grounding and transparency.