The LLM Already Knows: Estimating LLM-Perceived Question Difficulty via Hidden Representations

📅 2025-09-16
📈 Citations: 0
Influential: 0
📄 PDF

career value

195K/year
🤖 AI Summary
This work addresses the lack of explicit difficulty awareness in large language models (LLMs) when processing input queries. We propose a sampling-free, fine-tuning-free difficulty estimation method that operates solely on the initial hidden representations obtained from a single forward pass. Our core innovation is to model LLM generation as a Markov chain and define a value function over latent states to directly predict final output quality—enabling difficulty estimation without backpropagation or parameter updates. To our knowledge, this is the first method capable of estimating task difficulty using only one forward pass. The approach significantly improves generality and computational efficiency, outperforming existing baselines on both text and multimodal benchmarks. Moreover, it effectively enables adaptive inference strategies—including Best-of-N and self-consistency—reducing average token generation by over 30% and substantially lowering inference overhead.

Technology Category

Application Category

📝 Abstract
Estimating the difficulty of input questions as perceived by large language models (LLMs) is essential for accurate performance evaluation and adaptive inference. Existing methods typically rely on repeated response sampling, auxiliary models, or fine-tuning the target model itself, which may incur substantial computational costs or compromise generality. In this paper, we propose a novel approach for difficulty estimation that leverages only the hidden representations produced by the target LLM. We model the token-level generation process as a Markov chain and define a value function to estimate the expected output quality given any hidden state. This allows for efficient and accurate difficulty estimation based solely on the initial hidden state, without generating any output tokens. Extensive experiments across both textual and multimodal tasks demonstrate that our method consistently outperforms existing baselines in difficulty estimation. Moreover, we apply our difficulty estimates to guide adaptive reasoning strategies, including Self-Consistency, Best-of-N, and Self-Refine, achieving higher inference efficiency with fewer generated tokens.
Problem

Research questions and friction points this paper is trying to address.

Estimating LLM-perceived question difficulty efficiently
Leveraging hidden representations without output generation
Improving adaptive inference strategies via difficulty estimation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Leverages hidden representations for difficulty estimation
Models token generation as Markov chain process
Uses value function to predict output quality