Measuring and Enhancing Trustworthiness of LLMs in RAG through Grounded Attributions and Learning to Refuse

📅 2024-09-17

🏛️ arXiv.org

📈 Citations: 4

✨ Influential: 0

career value

174K/year

🤖 AI Summary

To address the low trustworthiness of large language models (LLMs) in retrieval-augmented generation (RAG) systems, this paper proposes Trust-Score—a quantitative framework for multi-dimensional trust assessment—and Trust-Align—a lightweight, fine-tuning-free alignment method. Trust-Score jointly models factual citation accuracy, refusal capability, and attribution grounding to holistically quantify LLM trustworthiness. Trust-Align enhances robustness in unanswerable question detection and evidence attribution through learned refusal mechanisms, multi-task prompt alignment, and cross-model generalization adaptation. Evaluated on ASQA, QAMPARI, and ELI5 benchmarks, our approach outperforms 26 out of 27 open-source models—including a +12.56% improvement for LLaMA-3-8B on ASQA—and efficiently adapts to models ranging from 1B to 8B parameters. To our knowledge, this is the first work to establish an interpretable, generalizable, and fine-tuning-free paradigm for enhancing LLM trustworthiness in RAG settings.

Technology Category

Application Category

📝 Abstract

LLMs are an integral component of retrieval-augmented generation (RAG) systems. While many studies focus on evaluating the overall quality of end-to-end RAG systems, there is a gap in understanding the appropriateness of LLMs for the RAG task. To address this, we introduce Trust-Score, a holistic metric that evaluates the trustworthiness of LLMs within the RAG framework. Our results show that various prompting methods, such as in-context learning, fail to effectively adapt LLMs to the RAG task as measured by Trust-Score. Consequently, we propose Trust-Align, a method to align LLMs for improved Trust-Score performance. 26 out of 27 models aligned using Trust-Align substantially outperform competitive baselines on ASQA, QAMPARI, and ELI5. Specifically, in LLaMA-3-8b, Trust-Align outperforms FRONT on ASQA (up 12.56), QAMPARI (up 36.04), and ELI5 (up 17.69). Trust-Align also significantly enhances models' ability to correctly refuse and provide quality citations. We also demonstrate the effectiveness of Trust-Align across different open-weight models, including the LLaMA series (1b to 8b), Qwen-2.5 series (0.5b to 7b), and Phi3.5 (3.8b). We release our code at https://github.com/declare-lab/trust-align.

Problem

Research questions and friction points this paper is trying to address.

Evaluating LLM trustworthiness in RAG systems.

Improving LLM alignment for better task performance.

Enhancing LLM ability to refuse and cite correctly.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Introduces Trust-Score for LLM trustworthiness evaluation.

Proposes Trust-Align to enhance LLM performance in RAG.

Demonstrates Trust-Align effectiveness across various open-weight models.

🔎 Similar Papers

No similar papers found.