🤖 AI Summary
This study investigates the impact mechanisms of method-level code changes on performance evolution in Java projects. To address the lack of fine-grained empirical evidence supporting developer intuition, the authors systematically quantify performance variations at the method level across 739 commits from 15 open-source Java projects, using JMH microbenchmarks and bytecode instrumentation. Results show that 32.7% of changes induce statistically significant performance fluctuations—more often degradations than improvements; conventional risk categorization by change type proves ineffective; algorithmic modifications exhibit the highest improvement potential yet carry the greatest regression risk; commits by senior developers demonstrate greater stability; and small-scale web applications are most performance-vulnerable. Crucially, this work is the first to uncover multidimensional interactions among developer experience, code complexity, domain scale, and change type. These findings provide data-driven foundations for deep integration of automated performance testing into CI pipelines.
📝 Abstract
Performance is a critical quality attribute in software development, yet the impact of method-level code changes on performance evolution remains poorly understood. While developers often make intuitive assumptions about which types of modifications are likely to cause performance regressions or improvements, these beliefs lack empirical validation at a fine-grained level. We conducted a large-scale empirical study analyzing performance evolution in 15 mature open-source Java projects hosted on GitHub. Our analysis encompassed 739 commits containing 1,499 method-level code changes, using Java Microbenchmark Harness (JMH) for precise performance measurement and rigorous statistical analysis to quantify both the significance and magnitude of performance variations. We employed bytecode instrumentation to capture method-specific execution metrics and systematically analyzed four key aspects: temporal performance patterns, code change type correlations, developer and complexity factors, and domain-size interactions. Our findings reveal that 32.7% of method-level changes result in measurable performance impacts, with regressions occurring 1.3 times more frequently than improvements. Contrary to conventional wisdom, we found no significant differences in performance impact distributions across code change categories, challenging risk-stratified development strategies. Algorithmic changes demonstrate the highest improvement potential but carry substantial regression risk. Senior developers produce more stable changes with fewer extreme variations, while code complexity correlates with increased regression likelihood. Domain-size interactions reveal significant patterns, with web server + small projects exhibiting the highest performance instability. Our study provides empirical evidence for integrating automated performance testing into continuous integration pipelines.