Less Approximates More: Harmonizing Performance and Confidence Faithfulness via Hybrid Post-Training for High-Stakes Tasks

📅 2026-04-09
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the critical issue of overconfidence in large language models during high-stakes tasks, which often leads to erroneous inferences due to misalignment between confidence and accuracy—commonly referred to as poor confidence calibration or faithfulness. To enhance this alignment, the authors propose HyTuning, a hybrid post-training framework that introduces a novel metric, Progressive Reasoning Gain (PRG), to quantify the extent to which intermediate reasoning steps support the final answer. HyTuning adaptively combines reinforcement learning with internal feedback (RLIF) and reasoning-distillation (RD) guided by a small set of supervised reasoning trajectories. This approach effectively mitigates error amplification caused by data scarcity and indiscriminate fusion strategies. Evaluated across both domain-specific and general benchmarks, HyTuning achieves simultaneous improvements in accuracy and confidence faithfulness using only limited supervision, demonstrating a “less-is-more” practical advantage.
📝 Abstract
Large language models are increasingly deployed in high-stakes tasks, where confident yet incorrect inferences may cause severe real-world harm, bringing the previously overlooked issue of confidence faithfulness back to the forefront. A promising solution is to jointly optimize unsupervised Reinforcement Learning from Internal Feedback (RLIF) with reasoning-trace-guided Reasoning Distillation (RD), which may face three persistent challenges: scarcity of high-quality training corpora, factually unwarranted overconfidence and indiscriminate fusion that amplifies erroneous updates. Inspired by the human confidence accumulation from uncertainty to certainty, we propose Progressive Reasoning Gain (PRG) to measure whether reasoning steps progressively strengthen support for the final answer. Furthermore, we introduce HyTuning, a hybrid post-training framework that adaptively reweights RD and RLIF via a PRG-style metric, using scarce supervised reasoning traces as a stable anchor while exploiting abundant unlabeled queries for scalability. Experiments on several domain-specific and general benchmarks demonstrate that HyTuning improves accuracy while achieving confidence faithfulness under limited supervision, supporting a practical "Less Approximates More" effect.
Problem

Research questions and friction points this paper is trying to address.

confidence faithfulness
high-stakes tasks
large language models
reasoning reliability
model calibration
Innovation

Methods, ideas, or system contributions that make the work stand out.

HyTuning
Progressive Reasoning Gain
Confidence Faithfulness
Reasoning Distillation
Reinforcement Learning from Internal Feedback
🔎 Similar Papers