Calibrated Decision-Making through LLM-Assisted Retrieval

📅 2024-10-28

🏛️ arXiv.org

📈 Citations: 2

✨ Influential: 0

career value

184K/year

🤖 AI Summary

Large language models (LLMs) frequently exhibit “high-confidence, low-accuracy” behavior in decision-support tasks, undermining human decision calibration. To address this, we propose CalibRAG—the first retrieval-augmented generation (RAG) framework explicitly optimizing for *decision calibration*. Its core contributions are: (1) a calibration-aware retrieval mechanism that prioritizes evidence directly supporting well-calibrated judgments—departing from conventional semantic relevance; (2) LLM-driven retrieval re-ranking coupled with uncertainty-aware document filtering; and (3) a quantitative evaluation framework for decision calibration. Experiments across multiple decision-making benchmarks demonstrate that CalibRAG significantly outperforms standard RAG and strong baselines: it improves response accuracy while reducing expected calibration error (ECE) by 23–37%, effectively narrowing the gap between model confidence and empirical accuracy.

Technology Category

Application Category

📝 Abstract

Recently, large language models (LLMs) have been increasingly used to support various decision-making tasks, assisting humans in making informed decisions. However, when LLMs confidently provide incorrect information, it can lead humans to make suboptimal decisions. To prevent LLMs from generating incorrect information on topics they are unsure of and to improve the accuracy of generated content, prior works have proposed Retrieval Augmented Generation (RAG), where external documents are referenced to generate responses. However, traditional RAG methods focus only on retrieving documents most relevant to the input query, without specifically aiming to ensure that the human user's decisions are well-calibrated. To address this limitation, we propose a novel retrieval method called Calibrated Retrieval-Augmented Generation (CalibRAG), which ensures that decisions informed by the retrieved documents are well-calibrated. Then we empirically validate that CalibRAG improves calibration performance as well as accuracy, compared to other baselines across various datasets.

Problem

Research questions and friction points this paper is trying to address.

Ensuring reliable decision-making through calibrated retrieval-augmented generation

Preventing LLMs from confidently providing incorrect information to users

Improving calibration and accuracy of AI-generated content for decisions

Innovation

Methods, ideas, or system contributions that make the work stand out.

Calibrated Retrieval-Augmented Generation for reliable decisions

Novel retrieval method ensuring well-calibrated decision-making

Improves calibration performance and accuracy across datasets

🔎 Similar Papers

Leveraging Large Language Models for Relevance Judgments in Legal Case Retrieval