Dango: A Mixed-Initiative Data Wrangling System using Large Language Model

📅 2025-03-05
📈 Citations: 0
Influential: 0
📄 PDF

career value

195K/year
🤖 AI Summary
Data curation is often inefficient due to ambiguous user intent, and existing tools frequently misinterpret intent in complex tasks. This paper proposes a human-in-the-loop multi-agent data curation system that integrates natural language dialogue, multi-representation support, and an LLM-driven multiple-choice clarification mechanism. It introduces a novel hybrid active interaction paradigm enabling seamless switching across representations and structured clarification Q&A. For the first time, it embeds multi-granularity explainability—comprising natural language step-by-step explanations and fine-grained data lineage tracing—deeply into the curation workflow. The system adopts a modular multi-agent architecture unifying natural language understanding/generation, interactive intent modeling, data lineage tracking, and multi-turn dialogue management. A user study (n=38) demonstrates a 41% improvement in intent clarification accuracy, a 33% increase in task completion efficiency, and strong generalization across diverse curation tasks.

Technology Category

Application Category

📝 Abstract
Data wrangling is a time-consuming and challenging task in a data science pipeline. While many tools have been proposed to automate or facilitate data wrangling, they often misinterpret user intent, especially in complex tasks. We propose Dango, a mixed-initiative multi-agent system for data wrangling. Compared to existing tools, Dango enhances user communication of intent by allowing users to demonstrate on multiple tables and use natural language prompts in a conversation interface, enabling users to clarify their intent by answering LLM-posed multiple-choice clarification questions, and providing multiple forms of feedback such as step-by-step natural language explanations and data provenance to help users evaluate the data wrangling scripts. We conducted a within-subjects user study with 38 participants and demonstrated that Dango's features can significantly improve intent clarification, accuracy, and efficiency in data wrangling. Furthermore, we demonstrated the generalizability of Dango by applying it to a broader set of data wrangling tasks.
Problem

Research questions and friction points this paper is trying to address.

Automates data wrangling with user intent clarification
Improves accuracy and efficiency in data processing
Supports complex tasks via natural language interaction
Innovation

Methods, ideas, or system contributions that make the work stand out.

Uses natural language prompts for user interaction
Incorporates multiple-choice questions for intent clarification
Provides step-by-step explanations and data provenance
🔎 Similar Papers
No similar papers found.
Wei-Hao Chen
Wei-Hao Chen
Purdue University
Artificial IntelligenceHuman Computer InteractionData ScienceSoftware Engineering
Weixi Tong
Weixi Tong
Purdue University
Large Language Models
A
Amanda Case
University of Iowa, Iowa City, Iowa, USA
T
Tianyi Zhang
Purdue University, West Lafayette, Indiana, USA