🤖 AI Summary
This study investigates large language models’ (LLMs) ability to pose clarification questions in task-oriented dialogue under referential ambiguity, specifically in asynchronous instruction-giver/instruction-follower settings. Method: Leveraging the Minecraft Dialogue Corpus—uniquely double-annotated with Segmented Discourse Representation Theory (SDRT) and coreference resolution—we comparatively analyze human and LLM questioning behavior in ambiguous contexts. Contribution/Results: Humans primarily ask clarification questions due to task uncertainty, rarely probing referential ambiguity; in contrast, LLMs exhibit heightened sensitivity to reference ambiguity but overlook task-related confusion. Crucially, we identify a strong correlation between LLMs’ clarification questioning and their simulated reasoning capacity. Integrating chain-of-thought reasoning increases their clarification question frequency by 42% and improves relevance by 31%. These findings demonstrate that augmenting LLMs with explicit reasoning mechanisms effectively recalibrates their ambiguity detection bias, aligning their clarification strategies more closely with human-like dialogue behavior.
📝 Abstract
In this work we examine LLMs' ability to ask clarification questions in task-oriented dialogues that follow the asynchronous instruction-giver/instruction-follower format. We present a new corpus that combines two existing annotations of the Minecraft Dialogue Corpus -- one for reference and ambiguity in reference, and one for SDRT including clarifications -- into a single common format providing the necessary information to experiment with clarifications and their relation to ambiguity. With this corpus we compare LLM actions with original human-generated clarification questions, examining how both humans and LLMs act in the case of ambiguity. We find that there is only a weak link between ambiguity and humans producing clarification questions in these dialogues, and low correlation between humans and LLMs. Humans hardly ever produce clarification questions for referential ambiguity, but often do so for task-based uncertainty. Conversely, LLMs produce more clarification questions for referential ambiguity, but less so for task uncertainty. We question if LLMs' ability to ask clarification questions is predicated on their recent ability to simulate reasoning, and test this with different reasoning approaches, finding that reasoning does appear to increase question frequency and relevancy.