🤖 AI Summary
Method naming practices in Jupyter Notebooks remain empirically understudied despite their widespread use in scientific computing.
Method: This paper presents the first large-scale empirical study of method naming in 691 real-world Jupyter Notebooks, combining code mining, syntactic parsing, and qualitative thematic analysis to characterize naming styles, grammatical structures, and abbreviation patterns.
Contribution/Results: We find that only 55.57% of methods begin with a verb—substantially below industrial standards—and half of methods containing return statements violate the verb-initial convention. We identify 68 distinct syntactic patterns and domain-specific high-frequency abbreviations (e.g., in mathematics and image processing). Overall, Notebook method names exhibit strong concision bias and weak verb orientation, markedly deviating from conventional software engineering norms. These findings provide critical empirical grounding for developing intelligent, domain-aware naming recommendation tools and pedagogical guidelines tailored to scientific computing environments.
📝 Abstract
Method names play an important role in communicating the purpose and behavior of their functionality. Research has shown that high-quality names significantly improve code comprehension and the overall maintainability of software. However, these studies primarily focus on naming practices in traditional software development. There is limited research on naming patterns in Jupyter Notebooks, a popular environment for scientific computing and data analysis. In this exploratory study, we analyze the naming practices found in 691 methods across 384 Jupyter Notebooks, focusing on three key aspects: naming style conventions, grammatical composition, and the use of abbreviations and acronyms. Our findings reveal distinct characteristics of notebook method names, including a preference for conciseness and deviations from traditional naming patterns. We identified 68 unique grammatical patterns, with only 55.57% of methods beginning with a verb. Further analysis revealed that half of the methods with return statements do not start with a verb. We also found that 30.39% of method names contain abbreviations or acronyms, representing mathematical or statistical terms and image processing concepts, among others. We envision our findings contributing to developing specialized tools and techniques for evaluating and recommending high-quality names in scientific code and creating educational resources tailored to the notebook development community.