π€ AI Summary
Retrieval-Augmented Generation (RAG) systems struggle to ensure response trustworthiness when internal parametric knowledge conflicts with or is less reliable than external retrieved knowledge; existing approaches address only isolated scenarios and lack a unified modeling framework. Method: We introduce the Trustworthiness Response Dataset (TRD), comprising 36,000 diverse questions, and propose BRIDGEβa dynamic response strategy framework featuring a novel soft-bias adaptive weighting mechanism and a maximum-soft-bias decision tree. BRIDGE jointly assesses source credibility and selects optimal response strategies across four realistic RAG scenarios, enabling LLM-based trustworthiness intervention between retrieval and generation. Contribution/Results: On TRD, BRIDGE achieves 5β15% higher accuracy than strong baselines, demonstrates balanced and stable performance across all scenarios, and significantly enhances the reliability of RAG responses in open-domain settings.
π Abstract
Retrieval-augmented generation (RAG) systems face critical challenges in balancing internal (parametric) and external (retrieved) knowledge, especially when these sources conflict or are unreliable. To analyze these scenarios comprehensively, we construct the Trustworthiness Response Dataset (TRD) with 36,266 questions spanning four RAG settings. We reveal that existing approaches address isolated scenarios-prioritizing one knowledge source, naively merging both, or refusing answers-but lack a unified framework to handle different real-world conditions simultaneously. Therefore, we propose the BRIDGE framework, which dynamically determines a comprehensive response strategy of large language models (LLMs). BRIDGE leverages an adaptive weighting mechanism named soft bias to guide knowledge collection, followed by a Maximum Soft-bias Decision Tree to evaluate knowledge and select optimal response strategies (trust internal/external knowledge, or refuse). Experiments show BRIDGE outperforms baselines by 5-15% in accuracy while maintaining balanced performance across all scenarios. Our work provides an effective solution for LLMs' trustworthy responses in real-world RAG applications.