Double-Calibration: Towards Trustworthy LLMs via Calibrating Knowledge and Reasoning Confidence

📅 2026-01-17

📈 Citations: 0

✨ Influential: 0

career value

162K/year

🤖 AI Summary

This work addresses the challenge of hallucination in large language models (LLMs) during reasoning and the inability of existing knowledge graph (KG)-enhanced approaches to jointly quantify the uncertainty of retrieved evidence and the model’s cognitive reasoning. To this end, we propose DoublyCal, a novel framework that introduces, for the first time, a dual calibration mechanism. It employs a lightweight proxy model to generate KG-augmented evidence with calibrated confidence scores and jointly models the uncertainties from both evidence retrieval and reasoning processes, enabling end-to-end traceable confidence guidance. With minimal token overhead, DoublyCal significantly improves both accuracy and confidence calibration of black-box LLMs on knowledge-intensive tasks.

Technology Category

Application Category

📝 Abstract

Trustworthy reasoning in Large Language Models (LLMs) is challenged by their propensity for hallucination. While augmenting LLMs with Knowledge Graphs (KGs) improves factual accuracy, existing KG-augmented methods fail to quantify epistemic uncertainty in both the retrieved evidence and LLMs'reasoning. To bridge this gap, we introduce DoublyCal, a framework built on a novel double-calibration principle. DoublyCal employs a lightweight proxy model to first generate KG evidence alongside a calibrated evidence confidence. This calibrated supporting evidence then guides a black-box LLM, yielding final predictions that are not only more accurate but also well-calibrated, with confidence scores traceable to the uncertainty of the supporting evidence. Experiments on knowledge-intensive benchmarks show that DoublyCal significantly improves both the accuracy and confidence calibration of black-box LLMs with low token cost.

Problem

Research questions and friction points this paper is trying to address.

hallucination

knowledge graph

epistemic uncertainty

confidence calibration

trustworthy reasoning

Innovation

Methods, ideas, or system contributions that make the work stand out.

double-calibration

knowledge graph

confidence calibration