π€ AI Summary
This work addresses the challenge of hallucination in large language models (LLMs) during reasoning and the inability of existing knowledge graph (KG)-enhanced approaches to jointly quantify the uncertainty of retrieved evidence and the modelβs cognitive reasoning. To this end, we propose DoublyCal, a novel framework that introduces, for the first time, a dual calibration mechanism. It employs a lightweight proxy model to generate KG-augmented evidence with calibrated confidence scores and jointly models the uncertainties from both evidence retrieval and reasoning processes, enabling end-to-end traceable confidence guidance. With minimal token overhead, DoublyCal significantly improves both accuracy and confidence calibration of black-box LLMs on knowledge-intensive tasks.
π Abstract
Trustworthy reasoning in Large Language Models (LLMs) is challenged by their propensity for hallucination. While augmenting LLMs with Knowledge Graphs (KGs) improves factual accuracy, existing KG-augmented methods fail to quantify epistemic uncertainty in both the retrieved evidence and LLMs'reasoning. To bridge this gap, we introduce DoublyCal, a framework built on a novel double-calibration principle. DoublyCal employs a lightweight proxy model to first generate KG evidence alongside a calibrated evidence confidence. This calibrated supporting evidence then guides a black-box LLM, yielding final predictions that are not only more accurate but also well-calibrated, with confidence scores traceable to the uncertainty of the supporting evidence. Experiments on knowledge-intensive benchmarks show that DoublyCal significantly improves both the accuracy and confidence calibration of black-box LLMs with low token cost.