🤖 AI Summary
This paper addresses the “vague perception” problem arising from frequent “Unknown” outputs by large language models (LLMs)—specifically, the difficulty in distinguishing between *inherently ill-posed questions* and *model capability limitations*. We propose the first fine-grained evaluation framework for this issue. Methodologically, we decouple *undecidability* from *unknown solvability*, design multi-strategy prompting stimuli—including chain-of-thought reconstruction and uncertainty probing—and integrate theoretical accuracy analysis with response attribution to assess both reasoning potential and process stability. Our contributions are threefold: (1) a principled, interpretable attribution mechanism for “Unknown” responses; (2) empirical evidence that a non-negligible fraction of “Unknown” outputs conceal latent correct answers amenable to elicitation; and (3) systematic benchmarking across multiple datasets, revealing concrete reasoning boundaries and improvement margins of state-of-the-art LLMs—thereby establishing a novel paradigm for evaluating model honesty and reasoning capacity.
📝 Abstract
Large Language Models (LLMs) frequently output the label emph{Unknown}, yet current evaluations focus almost exclusively on whether such answers are emph{honest} rather than why they arise. This blurs two distinct cases: (i) an input that is genuinely indeterminate and (ii) a solvable problem that the model fails to resolve. We call this phenomenon emph{Vague Perception}. And thus we introduce a framework that quantifies the proportion of emph{Unknown} responses attributable to model incapacity and tests whether guided stimulation can convert them into either correct (emph{Known}) or intrinsically indeterminate outcomes. By separating these sources of uncertainty, our method provides a clearer picture of LLM reasoning limits and their potential for improvement. As we get a theoretical accuracy of reasoning task on different LLMs, we apply different methods to test whether the model can reach the accuracy given a baseline framework. Our work is meaningful in exploring the true reasoning ability of LLMs and providing a new perspective on solving the emph{Vague Perception} phenomenon.