🤖 AI Summary
Existing code completion tools for table data cleaning scripts neglect data context—such as schema and actual cell values—leading to inaccurate suggestions and frequent workflow interruptions. To address this, we propose Xavier, the first data-aware code completion method designed specifically for computational notebooks. Xavier dynamically identifies relevant fields via real-time data dependency analysis; encodes multimodal context—including syntactic structure, data schema, and representative sample values; and enables immediate preview and highlighting of transformation results through an embedded lightweight sandbox. It is deeply integrated into Jupyter, supporting end-to-end interactive development. A user study with 16 data analysts demonstrates that Xavier significantly reduces context-switching frequency, shortens average task completion time by 37%, and substantially improves both script accuracy and coding efficiency.
📝 Abstract
Data analysts frequently employ code completion tools in writing custom scripts to tackle complex tabular data wrangling tasks. However, existing tools do not sufficiently link the data contexts such as schemas and values with the code being edited. This not only leads to poor code suggestions, but also frequent interruptions in coding processes as users need additional code to locate and understand relevant data. We introduce Xavier, a tool designed to enhance data wrangling script authoring in computational notebooks. Xavier maintains users' awareness of data contexts while providing data-aware code suggestions. It automatically highlights the most relevant data based on the user's code, integrates both code and data contexts for more accurate suggestions, and instantly previews data transformation results for easy verification. To evaluate the effectiveness and usability of Xavier, we conducted a user study with 16 data analysts, showing its potential to streamline data wrangling scripts authoring.