EAGER: Entropy-Aware GEneRation for Adaptive Inference-Time Scaling

📅 2025-10-13

📈 Citations: 0

✨ Influential: 0

career value

179K/year

🤖 AI Summary

Existing test-time scaling methods allocate fixed computational budgets to all prompts, ignoring their inherent complexity differences and leading to resource inefficiency. This paper proposes a training-free, entropy-aware dynamic inference framework: it quantifies model uncertainty in real time via token-level predictive entropy and triggers multi-path branching only at high-entropy positions, while adaptively reallocating saved computation to harder samples. The method enables fine-grained, on-demand expansion of inference paths. It achieves state-of-the-art efficiency–accuracy trade-offs across multiple open-source models and challenging reasoning benchmarks (e.g., GSM8K, MATH), reducing token generation by up to 65% versus full parallel sampling and improving Pass@k by up to 37%. Its core innovation lies in the first use of differentiable token-level entropy as a dynamic, gradient-based branching gate—enabling adaptive computation scheduling with zero training overhead.

Technology Category

Application Category

📝 Abstract

With the rise of reasoning language models and test-time scaling methods as a paradigm for improving model performance, substantial computation is often required to generate multiple candidate sequences from the same prompt. This enables exploration of different reasoning paths toward the correct solution, however, allocates the same compute budget for each prompt. Grounded on the assumption that different prompts carry different degrees of complexity, and thus different computation needs, we propose EAGer, a training-free generation method that leverages model uncertainty through token-wise entropy distribution to reduce redundant computation and concurrently improve overall performance. EAGer allows branching to multiple reasoning paths only in the presence of high-entropy tokens, and then reallocates the saved compute budget to the instances where exploration of alternative paths is most needed. We find that across multiple open-source models on complex reasoning benchmarks such as AIME 2025, EAGer can reallocate the budget without accessing target labels, achieving the best efficiency-performance trade-off in terms of reasoning length and Pass@k. When target labels are accessible, EAGer generates up to 65% fewer tokens (hence saving compute) and achieves up to 37% improvement in Pass@k compared to the Full Parallel Sampling.

Problem

Research questions and friction points this paper is trying to address.

Reduces redundant computation in reasoning language models

Allocates compute budget based on prompt complexity

Improves efficiency-performance trade-off via entropy-aware generation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses token-wise entropy to guide generation

Branches reasoning paths for high-entropy tokens

Reallocates compute budget to complex instances

🔎 Similar Papers

No similar papers found.