Observing Fine-Grained Changes in Jupyter Notebooks During Development Time

📅 2025-07-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Despite the widespread adoption of Jupyter Notebooks in data science, there exists no systematic empirical study of fine-grained editing behaviors during notebook development—unlike software engineering, where such analyses have advanced refactoring, security, and code completion. Method: We designed and deployed a purpose-built logging infrastructure to capture interactive development activities of 20 practitioners engaged in real-world data analysis and machine learning tasks, yielding a high-fidelity dataset comprising 2,655 cells and 9,207 executions. Contribution/Results: Our analysis reveals that notebooks serve three intertwined roles—exploratory analysis, development, and debugging—and that edits are predominantly small-scale, iterative corrections, confirming their dynamic, trial-and-error usage paradigm. This work establishes the first large-scale empirical foundation for notebook evolution research in data science, providing both actionable insights and a novel conceptual framework to guide the design of intelligent notebook tools and automated assistance systems.

Technology Category

Application Category

📝 Abstract
In software engineering, numerous studies have focused on the analysis of fine-grained logs, leading to significant innovations in areas such as refactoring, security, and code completion. However, no similar studies have been conducted for computational notebooks in the context of data science. To help bridge this research gap, we make three scientific contributions: we (1) introduce a toolset for collecting code changes in Jupyter notebooks during development time; (2) use it to collect more than 100 hours of work related to a data analysis task and a machine learning task (carried out by 20 developers with different levels of expertise), resulting in a dataset containing 2,655 cells and 9,207 cell executions; and (3) use this dataset to investigate the dynamic nature of the notebook development process and the changes that take place in the notebooks. In our analysis of the collected data, we classified the changes made to the cells between executions and found that a significant number of these changes were relatively small fixes and code iteration modifications. This suggests that notebooks are used not only as a development and exploration tool but also as a debugging tool. We report a number of other insights and propose potential future research directions on the novel data.
Problem

Research questions and friction points this paper is trying to address.

Analyzing fine-grained code changes in Jupyter notebooks during development
Bridging research gap in computational notebooks for data science
Investigating dynamic notebook development processes and cell modifications
Innovation

Methods, ideas, or system contributions that make the work stand out.

Toolset for collecting Jupyter notebook code changes
Dataset with 2,655 cells and 9,207 executions
Analysis of dynamic notebook development changes
🔎 Similar Papers
No similar papers found.