Modeling Future Conversation Turns to Teach LLMs to Ask Clarifying Questions

📅 2024-10-17
🏛️ arXiv.org
📈 Citations: 2
Influential: 0
📄 PDF
🤖 AI Summary
Large language models (LLMs) often impose a single interpretation on ambiguous user queries, leading to responses misaligned with the user’s true intent. To address this, we propose a forward-looking clarification questioning method that simulates multi-turn future dialogue trajectories to construct a diverse set of intent-grounded candidate answers; these are then preference-labeled based on their contextual appropriateness, prompting the model to proactively generate effective clarification questions. This work is the first to extend preference learning to future dialogue modeling—departing from conventional annotation paradigms that rely solely on historical context. Through supervised fine-tuning and contrastive learning, our approach achieves a 5% absolute improvement in F1 score for clarification question generation on open-domain question answering tasks, significantly enhancing the model’s capability to identify and accommodate users’ multifaceted intents.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) must often respond to highly ambiguous user requests. In such cases, the LLM's best response may be to ask a clarifying question to elicit more information. We observe existing LLMs often respond by presupposing a single interpretation of such ambiguous requests, frustrating users who intended a different interpretation. We speculate this is caused by current preference data labeling practice, where LLM responses are evaluated only on their prior contexts. To address this, we propose to assign preference labels by simulating their expected outcomes in the future turns. This allows LLMs to learn to ask clarifying questions when it can generate responses that are tailored to each user interpretation in future turns. In experiments on open-domain QA, we compare systems that trained using our proposed preference labeling methods against standard methods, which assign preferences based on only prior context. We evaluate systems based on their ability to ask clarifying questions that can recover each user's interpretation and expected answer, and find that our training with our proposed method trains LLMs to ask clarifying questions with a 5% improvement in F1 measured against the answer set from different interpretations of each query
Problem

Research questions and friction points this paper is trying to address.

LLMs struggle with ambiguous user requests
Current preference data labeling lacks future context
Improved method for LLMs to ask clarifying questions
Innovation

Methods, ideas, or system contributions that make the work stand out.

Simulates future turns for preference labeling
Trains LLMs to ask clarifying questions effectively
Improves accuracy by modeling conversation outcomes
🔎 Similar Papers
No similar papers found.