🤖 AI Summary
This study investigates differences in implicit, process-oriented skills between expert and novice data scientists during problem-solving. By conducting a multi-level sequential analysis of 440 publicly available Jupyter Notebooks, the authors map low-level coding behaviors to high-level problem-solving phases—such as data ingestion, exploratory data analysis, and modeling—and find that experts do not follow fundamentally distinct phase-transition patterns. Instead, their expertise manifests through shorter, highly iterative workflows and fine-grained, context-sensitive operations. In contrast to novices’ preference for extended linear sequences, expert strategies demonstrate greater adaptability and efficiency. These findings offer empirical grounding for data science education initiatives that emphasize process-oriented pedagogy and assessment, shifting focus from final outputs to the dynamic practices underlying effective problem-solving.
📝 Abstract
The development of data science expertise requires tacit, process-oriented skills that are difficult to teach directly. This study addresses the resulting challenge of empirically understanding how the problem-solving processes of experts and novices differ. We apply a multi-level sequence analysis to 440 Jupyter notebooks from a public dataset, mapping low-level coding actions to higher-level problem-solving practices. Our findings reveal that experts do not follow fundamentally different transitions between data science phases than novices (e.g., Data Import, EDA, Model Training, Visualization). Instead, expertise is distinguished by the overall workflow structure from a problem-solving perspective and cell-level, fine-grained action patterns. Novices tend to follow long, linear processes, whereas experts employ shorter, more iterative strategies enacted through efficient, context-specific action sequences. These results provide data science educators with empirical insights for curriculum design and assessment, shifting the focus from final products toward the development of the flexible, iterative thinking that defines expertise-a priority in a field increasingly shaped by AI tools.