🤖 AI Summary
This study investigates differences and complementarities between human and generative AI (GenAI) practices in computational notebooks (e.g., Kaggle competition submissions), focusing on coding and documentation behaviors. Using three empirical case studies, we compare gold-medal human-authored notebooks with GenAI-generated counterparts via (1) code feature analysis, (2) documentation structural complexity assessment, and (3) code smell and technical debt measurement. Results show humans significantly outperform GenAI in structural diversity, narrative coherence, and innovative design, whereas GenAI produces higher-quality code with lower technical debt—yet yields superficial, reasoning-deficient documentation. Based on these findings, we propose four research agendas for human-AI collaboration, establishing—for the first time—a systematic division of labor: “humans excel at architectural design and explanatory reasoning; AI excels at implementation and specification adherence.” This work provides empirical grounding and theoretical guidance for building trustworthy, high-productivity notebook-based human-AI collaboration paradigms.
📝 Abstract
Computational notebooks have become the preferred tool of choice for data scientists and practitioners to perform analyses and share results. Notebooks uniquely combine scripts with documentation. With the emergence of generative AI (GenAI) technologies, it is increasingly important, especially in competitive settings, to distinguish the characteristics of human-written versus GenAI.
In this study, we present three case studies to explore potential strengths of both humans and GenAI through the coding and documenting activities in notebooks. We first characterize differences between 25 code and documentation features in human-written, medal-winning Kaggle notebooks. We find that gold medalists are primarily distinguished by longer and more detailed documentation. Second, we analyze the distinctions between human-written and GenAI notebooks. Our results show that while GenAI notebooks tend to achieve higher code quality (as measured by metrics like code smells and technical debt), human-written notebooks display greater structural diversity, complexity, and innovative approaches to problem-solving. Based on these results, we envision the work as groundwork that highlight four agendas to further investigate how GenAI could be utilized in notebooks that maximizes the potential collaboration between human and AI.