Identifying Influential N-grams in Confidence Calibration via Regression Analysis

📅 2026-04-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This study addresses the tendency of large language models to exhibit overconfidence that is inconsistent with linguistic uncertainty during reasoning. Through regression analysis, the authors systematically identify specific n-gram expressions significantly associated with high confidence, and employ causal validation to confirm their role in driving overconfidence. Building on these findings, they propose a novel method that improves confidence calibration without compromising task performance. Extensive experiments across multiple mainstream language models and question-answering benchmarks demonstrate the effectiveness of the approach, offering the first evidence of a causal link between concrete linguistic patterns and confidence bias.
📝 Abstract
While large language models (LLMs) improve performance by explicit reasoning, their responses are often overconfident, even though they include linguistic expressions demonstrating uncertainty. In this work, we identify what linguistic expressions are related to confidence by applying the regression method. Specifically, we predict confidence of those linguistic expressions in the reasoning parts of LLMs as the dependent variables and analyze the relationship between a specific $n$-gram and confidence. Across multiple models and QA benchmarks, we show that LLMs remain overconfident when reasoning is involved and attribute this behavior to specific linguistic information. Interestingly, several of the extracted expressions coincide with cue phrases intentionally inserted on test-time scaling to improve reasoning performance. Through our test on causality and verification that the extracted linguistic information truly affects confidence, we reveal that confidence calibration is possible by simply suppressing those overconfident expressions without drops in performance.
Problem

Research questions and friction points this paper is trying to address.

confidence calibration
large language models
overconfidence
linguistic expressions
n-grams
Innovation

Methods, ideas, or system contributions that make the work stand out.

confidence calibration
n-grams
regression analysis
large language models
overconfidence
🔎 Similar Papers
No similar papers found.
S
Shintaro Ozaki
Nara Institute of Science and Technology (NAIST)
W
Wataru Hashimoto
Nara Institute of Science and Technology (NAIST)
Hidetaka Kamigaito
Hidetaka Kamigaito
Nara Institute of Science and Technology (NAIST)
Natural Language Processing
K
Katsuhiko Hayashi
Nara Institute of Science and Technology (NAIST), The University of Tokyo
Taro Watanabe
Taro Watanabe
Nara Institute of Science and Technology
Machine TranslationMachine Learning