Knowing but Not Showing: LLMs Recognize Ambiguity but Rarely Ask Clarifying Questions

📅 2026-05-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Although large language models can recognize ambiguity in users’ vague queries, they rarely initiate clarification questions and instead tend to provide direct answers, often leading to erroneous responses. This work systematically uncovers a significant disconnect between models’ ambiguity detection capabilities and their clarification behaviors. Through three experimental settings—standard question answering, explicit ambiguity judgment, and behavioral analysis—the study employs a judge model to categorize responses into direct answers, refusals, or clarification requests, while also examining the impact of retrieved context on model behavior. Findings reveal that while models accurately identify ambiguities in explicit tasks, they almost never ask clarifying questions during actual question answering. Moreover, although retrieved context improves answer quality, it further suppresses the models’ already limited tendency to seek clarification.
📝 Abstract
User queries are often underspecified and may admit multiple valid interpretations. Rather than silently making assumptions about the user's intent, a helpful assistant should surface such ambiguity by asking a clarifying question. Doing so requires two abilities: recognizing that a query is ambiguous, and acting on that recognition by seeking clarification instead of answering directly. To study these abilities, we evaluate models on ambiguous, unambiguous, and disambiguated questions in three settings: standard question answering, explicit ambiguity judgment, and behavioral analysis, where a judge model classifies responses as direct answers, refusals, or clarifying questions. We find a clear gap between recognition and behavior: models often identify ambiguity when explicitly asked to judge it, yet in the QA setting they overwhelmingly default to direct answers. Retrieved context further widens this gap by improving answerability while making models even less likely to ask clarifying questions.
Problem

Research questions and friction points this paper is trying to address.

ambiguity recognition
clarifying questions
large language models
user intent
underspecified queries
Innovation

Methods, ideas, or system contributions that make the work stand out.

ambiguity recognition
clarifying questions
large language models
behavioral analysis
cognitive-behavioral gap
🔎 Similar Papers
No similar papers found.