Enhancing Computational Notebooks with Code+Data Space Versioning

📅 2025-04-02
📈 Citations: 2
Influential: 1
📄 PDF
🤖 AI Summary
Existing Jupyter-style notebooks employ a linear execution model, which poorly supports data scientists’ typical nonlinear exploratory behaviors—such as backward execution, branch switching, and full-state rollback. To address this, we propose Kishuboard: a novel computational notebook system built upon two-dimensional versioning of both code and data spaces. Kishuboard introduces joint code-and-data state snapshots, constructs a dependency-aware version graph, and designs an overlayable one-dimensional history view to balance intuitiveness and expressive power. Interactive navigation controls enable flexible multi-branch state management. A user study demonstrates that Kishuboard significantly improves efficiency and task completion rates for complex exploratory workflows. By unifying and scaling versioning across code and data states, Kishuboard provides principled, extensible support for nonlinear data science workflows.

Technology Category

Application Category

📝 Abstract
There is a gap between how people explore data and how Jupyter-like computational notebooks are designed. People explore data nonlinearly, using execution undos, branching, and/or complete reverts, whereas notebooks are designed for sequential exploration. Recent works like ForkIt are still insufficient to support these multiple modes of nonlinear exploration in a unified way. In this work, we address the challenge by introducing two-dimensional code+data space versioning for computational notebooks and verifying its effectiveness using our prototype system, Kishuboard, which integrates with Jupyter. By adjusting code and data knobs, users of Kishuboard can intuitively manage the state of computational notebooks in a flexible way, thereby achieving both execution rollbacks and checkouts across complex multi-branch exploration history. Moreover, this two-dimensional versioning mechanism can easily be presented along with a friendly one-dimensional history. Human subject studies indicate that Kishuboard significantly enhances user productivity in various data science tasks.
Problem

Research questions and friction points this paper is trying to address.

Bridging nonlinear data exploration with sequential notebook design
Unifying multiple modes of nonlinear exploration in notebooks
Enhancing productivity via flexible code+data versioning in notebooks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Two-dimensional code+data space versioning
Integration with Jupyter notebooks
Flexible multi-branch exploration management
🔎 Similar Papers
No similar papers found.
H
Hanxi Fang
University of Illinois Urbana-Champaign, Urbana, Illinois, USA
Supawit Chockchowwat
Supawit Chockchowwat
Google
databasesdata miningdistributed systemsmachine learning
H
Hari Sundaram
University of Illinois Urbana-Champaign, Urbana, Illinois, USA
Yongjoo Park
Yongjoo Park
University of Illinois Urbana-Champaign
Database SystemsSystems for Machine Learning