Uncertainty Under the Curve: A Sequence-Level Entropy Area Metric for Reasoning LLM

📅 2025-08-27

📈 Citations: 0

✨ Influential: 0

career value

158K/year

🤖 AI Summary

This paper addresses the challenge of efficiently and interpretably quantifying uncertainty during large language model (LLM) inference. We propose the Entropy Area Score (EAS), a sequence-level uncertainty metric derived solely from the model’s intrinsic token-level predictive entropy. EAS computes the area under the normalized entropy curve over the generation process, requiring no external models or multiple sampling—ensuring both computational efficiency and inherent interpretability. Experiments demonstrate that EAS strongly correlates with answer-level entropy and effectively identifies high-value training samples. In mathematical reasoning tasks, student models trained on EAS-filtered data achieve significantly higher accuracy than those trained on data selected via conventional pass-rate filtering, under identical sample budgets. Our approach establishes a novel paradigm for uncertainty-aware modeling and high-quality data curation in LLM training.

Technology Category

Application Category

📝 Abstract

In this work, we introduce Entropy Area Score (EAS), a simple yet effective metric to quantify uncertainty in the answer generation process of reasoning large language models (LLMs). EAS requires neither external models nor repeated sampling, it integrates token-level predictive entropy from the model itself to capture the evolution of uncertainty during generation. Empirical results show that EAS is strongly correlated with answer entropy across models and datasets. In training data selection, EAS identifies high-potential samples and consistently outperforms Pass Rate filtering under equal sample budgets, improving student model accuracy on math benchmarks. EAS is both efficient and interpretable, offering a practical tool for uncertainty modeling and data quality assessment in LLM training.

Problem

Research questions and friction points this paper is trying to address.

Quantify uncertainty in reasoning LLM answer generation

Integrate token-level entropy to capture uncertainty evolution

Improve training data selection and model accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Entropy Area Score integrates token-level predictive entropy

EAS requires no external models or repeated sampling

Metric captures uncertainty evolution during answer generation

🔎 Similar Papers

Benchmarking Uncertainty Quantification Methods for Large Language Models with LM-Polygraph