"I Don't Know" -- Towards Appropriate Trust with Certainty-Aware Retrieval Augmented Generation

📅 2026-05-01

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

Large language models often generate misleading responses due to overconfidence and hallucination, undermining user trust. To address this, this work proposes the CERTA framework, which introduces a reflexive confidence mechanism within retrieval-augmented generation (RAG). By jointly analyzing the relevance among the input question, retrieved context, and generated answer, CERTA explicitly quantifies and expresses model uncertainty to produce more honest and reliable responses. The study contributes the first reflexive uncertainty modeling approach in RAG and introduces the Certainty Benchmark—a novel evaluation suite encompassing four types of non-objective questions: factuality, preference, flattery, and morality. Experimental results demonstrate that CERTA effectively identifies uncertain answers, significantly reduces over-accommodating behavior, and exhibits greater caution in moral judgment tasks.

📝 Abstract

Achieving the right amount of trust in AI systems is important, but challenging. The problem is exacerbated with the rise of Large Language Models (LLMs) as they provide human-level communication capabilities, but potentially hallucinate in the content that they generate. Moreover, they express over-confidence in their answers, making it difficult for users to judge their truthfulness. An important human value that users seek is benevolence, which can be met by LLM's self-reflection leading to reliable and honest answers. Accordingly, this paper proposes conveying appropriate levels of self-reflected certainty to build appropriate trust. Our contributions are twofold: 1) We develop CERTA (Certainty Enhanced RAG for Trustworthy Answers), a specialized Retrieval Augmented Generation (RAG) system that incorporates the relevance between question, context, and answer to reflect its uncertainty in answering questions; 2) We create the Certainty Benchmark with 90 question-context pairs of non-objective questions, divided over four categories (factuality, preference, sycophancy, morality) and three types of contexts (relevant, incomplete, irrelevant). We run experiments with a baseline RAG system and three CERTA settings using two LLMs. Our evaluations indicate that CERTA helps identify answers that are uncertain, decreases the cases of over-agreeing, and provides cautious behavior when prompted for moral judgments.

Problem

Research questions and friction points this paper is trying to address.

trust

hallucination

over-confidence

Large Language Models

certainty

Innovation

Methods, ideas, or system contributions that make the work stand out.

Certainty-Aware RAG

Appropriate Trust

Self-Reflected Uncertainty