ViseGPT: Towards Better Alignment of LLM-generated Data Wrangling Scripts and User Prompts

📅 2025-08-02
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the frequent semantic inconsistency between LLM-generated data-processing scripts and users’ natural-language instructions, this paper proposes an end-to-end verification and feedback framework. First, it jointly leverages LLMs and rule-based parsing to extract semantic constraints from user prompts. Second, it automatically generates targeted test cases to validate functional compliance of the generated scripts. Third, it visualizes execution outcomes via a customized interactive Gantt chart to enhance debugging interpretability. The framework integrates three core innovations: semantic constraint extraction, automated test-case generation, and explainable visualization feedback. In a user study, it significantly improves script debugging efficiency—reducing average iteration count by 42%—while strengthening users’ ability to localize and correct errors, thereby streamlining the development loop for data-processing scripts.

Technology Category

Application Category

📝 Abstract
Large language models (LLMs) enable the rapid generation of data wrangling scripts based on natural language instructions, but these scripts may not fully adhere to user-specified requirements, necessitating careful inspection and iterative refinement. Existing approaches primarily assist users in understanding script logic and spotting potential issues themselves, rather than providing direct validation of correctness. To enhance debugging efficiency and optimize the user experience, we develop ViseGPT, a tool that automatically extracts constraints from user prompts to generate comprehensive test cases for verifying script reliability. The test results are then transformed into a tailored Gantt chart, allowing users to intuitively assess alignment with semantic requirements and iteratively refine their scripts. Our design decisions are informed by a formative study (N=8) that explores user practices and challenges. We further evaluate the effectiveness and usability of ViseGPT through a user study (N=18). Results indicate that ViseGPT significantly improves debugging efficiency for LLM-generated data-wrangling scripts, enhances users' ability to detect and correct issues, and streamlines the workflow experience.
Problem

Research questions and friction points this paper is trying to address.

Ensures LLM-generated scripts meet user requirements accurately
Automates constraint extraction and test case generation
Improves debugging efficiency with intuitive Gantt chart feedback
Innovation

Methods, ideas, or system contributions that make the work stand out.

Automatically extracts constraints from user prompts
Generates comprehensive test cases for verification
Transforms test results into tailored Gantt charts