Synthetic Clarification and Correction Dialogues about Data-Centric Tasks -- A Teacher-Student Approach

📅 2025-03-18

📈 Citations: 0

✨ Influential: 0

career value

170K/year

🤖 AI Summary

This work addresses the dynamically unpredictable multi-turn dialogue paths in data-center table question answering, arising from incomplete user inputs or data deficiencies. We propose the first human-AI collaborative dialogue synthesis framework. Methodologically, we introduce a teacher-student synthesis paradigm: a strong large language model (LLM) acts as the “teacher” to validate logical consistency of dialogues; we explicitly model information asymmetry and dynamic error correction to support both AI-initiated clarification and user-initiated correction; and we integrate structured table reasoning prompts, multi-turn dialogue state modeling, and LLM-driven synthesis. We construct high-fidelity dialogue benchmarks on TAT-QA and WikiTableQuestions. Empirical evaluation reveals significant bottlenecks in current LLMs’ capabilities for generating clarifying questions and integrating user feedback—highlighting critical gaps in interactive table QA systems.

Technology Category

Application Category

📝 Abstract

Real dialogues with AI assistants for solving data-centric tasks often follow dynamic, unpredictable paths due to imperfect information provided by the user or in the data, which must be caught and handled. Developing datasets which capture such user-AI interactions is difficult and time-consuming. In this work, we develop a novel framework for synthetically generating controlled, multi-turn conversations between a user and AI assistant for the task of table-based question answering, which can be generated from an existing dataset with fully specified table QA examples for any target domain. Each conversation aims to solve a table-based reasoning question through collaborative effort, modeling one of two real-world scenarios: (1) an AI-initiated clarification, or (2) a user-initiated correction. Critically, we employ a strong teacher LLM to verify the correctness of our synthetic conversations, ensuring high quality. We demonstrate synthetic datasets generated from TAT-QA and WikiTableQuestions as benchmarks of frontier LLMs. We find that even larger models struggle to effectively issuing clarification questions and accurately integrate user feedback for corrections.

Problem

Research questions and friction points this paper is trying to address.

Develop synthetic dialogues for user-AI interactions in data-centric tasks.

Generate controlled multi-turn conversations for table-based question answering.

Ensure high-quality synthetic datasets using a teacher LLM for verification.

Innovation

Methods, ideas, or system contributions that make the work stand out.

Synthetic generation of user-AI dialogues

Teacher LLM verifies conversation correctness

Focus on table-based question answering tasks

🔎 Similar Papers

No similar papers found.

💼 Related Jobs

PhD GenAI Research Scientist Intern

Databricks

SF Bay Area Hourly Rate$54—$60 USD

San Francisco, CA, USA

Authors to Follow