🤖 AI Summary
This study investigates the impact of Extract Method and Inline Method refactoring on code comprehension, navigation efficiency, and cognitive load among novice Java programmers, addressing the limitations of relying solely on static metrics to evaluate refactoring effectiveness. Through an eye-tracking experiment combined with task completion time, number of attempts, visual behavior measures (e.g., fixation duration and regression count), and supplemented by questionnaires and interviews, the research systematically compares the two refactoring strategies across tasks of varying complexity for the first time. Results reveal that Extract Method significantly enhances performance in complex tasks—reducing completion time by 78.8% and regressions by 84.6%—yet impairs efficiency in simple tasks, increasing time by 166.9% and regressions by 200%. These findings challenge the pedagogical assumption that method extraction is universally beneficial and underscore the importance of aligning refactoring choices with task complexity.
📝 Abstract
Developers often extract methods to improve readability, understanding, and reuse, while inlining keeps logic in one block. Prior work based on static metrics has not shown clear differences between these practices, and the human side of comprehension and navigation remains underexplored. We investigate Inline Method vs. Extract Method refactorings using a dynamic approach: eye tracking while participants read and solve tasks. We analyze key code areas and compare visual effort and reading behavior (fixation duration and count, regressions, revisits), alongside time and attempts. We ran a controlled experiment with 32 Java novices, followed by short interviews. Each participant solved eight simple tasks across four programs presented in an inlined version and four in an extracted version. We also surveyed 58 additional novices for complementary quantitative and qualitative data. Results show that effects depend on task difficulty. In two tasks, method extraction improved performance and reduced visual effort, with time decreasing by up to 78.8% and regressions by 84.6%. For simpler tasks (e.g., square area), extraction hurt performance: time increased by up to 166.9% and regressions by 200%. Even with meaningful method names, novices often switched back and forth between call sites and extracted methods, increasing navigation and cognitive load. Preferences frequently favored extraction for readability and reuse, but did not always match measured performance. These findings suggest educators should be cautious about premature modularization for novices and highlight eye tracking as a useful complement to static metrics.