Xavier: Toward Better Coding Assistance in Authoring Tabular Data Wrangling Scripts

📅 2025-03-04

📈 Citations: 0

✨ Influential: 0

career value

168K/year

🤖 AI Summary

Existing code completion tools for table data cleaning scripts neglect data context—such as schema and actual cell values—leading to inaccurate suggestions and frequent workflow interruptions. To address this, we propose Xavier, the first data-aware code completion method designed specifically for computational notebooks. Xavier dynamically identifies relevant fields via real-time data dependency analysis; encodes multimodal context—including syntactic structure, data schema, and representative sample values; and enables immediate preview and highlighting of transformation results through an embedded lightweight sandbox. It is deeply integrated into Jupyter, supporting end-to-end interactive development. A user study with 16 data analysts demonstrates that Xavier significantly reduces context-switching frequency, shortens average task completion time by 37%, and substantially improves both script accuracy and coding efficiency.

Technology Category

Application Category

📝 Abstract

Data analysts frequently employ code completion tools in writing custom scripts to tackle complex tabular data wrangling tasks. However, existing tools do not sufficiently link the data contexts such as schemas and values with the code being edited. This not only leads to poor code suggestions, but also frequent interruptions in coding processes as users need additional code to locate and understand relevant data. We introduce Xavier, a tool designed to enhance data wrangling script authoring in computational notebooks. Xavier maintains users' awareness of data contexts while providing data-aware code suggestions. It automatically highlights the most relevant data based on the user's code, integrates both code and data contexts for more accurate suggestions, and instantly previews data transformation results for easy verification. To evaluate the effectiveness and usability of Xavier, we conducted a user study with 16 data analysts, showing its potential to streamline data wrangling scripts authoring.

Problem

Research questions and friction points this paper is trying to address.

Improves code completion for tabular data wrangling scripts

Links data contexts with code for better suggestions

Provides instant data transformation previews for verification

Innovation

Methods, ideas, or system contributions that make the work stand out.

Data-aware code suggestions for tabular data wrangling

Automatic highlighting of relevant data based on code

Instant preview of data transformation results

🔎 Similar Papers

CleanAgent: Automating Data Standardization with LLM-based Agents