🤖 AI Summary
This study investigates the impact of temporal dependencies and seasonal effects on predicting software technical debt, measured by the SQALE index. Using commit histories from 31 Java open-source projects, we systematically evaluate 11 time-series and machine learning models—including ARIMAX, SARIMA, LSTM, XGBoost, and Random Forest—comparing their predictive performance. Methodologically, we pioneer the application of multivariate time-series models (notably ARIMAX) to technical debt forecasting and introduce a novel approach to model and quantify seasonal patterns via seasonal decomposition and sliding-window evaluation. Our results demonstrate that ARIMAX achieves the best overall performance; incorporating explicit seasonality reduces the mean absolute error (MAE) of time-series models by 5.2% on average, yielding statistically significant accuracy improvements. The study establishes that explicitly modeling both temporal dependencies and periodicity is essential for robust technical debt prediction.
📝 Abstract
Code Technical Debt prediction has become a popular research niche in recent software engineering literature. Technical Debt is an important metric in software projects as it measures professionals' effort to clean the code. Therefore, predicting its future behavior becomes a crucial task. However, no well-defined and consistent approach can completely capture the features that impact the evolution of Code Technical Debt. The goal of this study is to evaluate the impact of considering time-dependent techniques as well as seasonal effects in temporal data in the prediction performance within the context of Code Technical Debt. The study adopts existing, yet not extensively adopted, time-dependent prediction techniques and compares their prediction performance to commonly used Machine Learning models. Further, the study strengthens the evaluation of time-dependent methods by extending the analysis to capture the impact of seasonality in Code Technical Debt data. We trained 11 prediction models using the commit history of 31 open-source projects developed with Java. We predicted the future observations of the SQALE index to evaluate their predictive performance. Our study confirms the positive impact of considering time-dependent techniques. The adopted multivariate time series analysis model ARIMAX overcame the rest of the adopted models. Incorporating seasonal effects led to an enhancement in the predictive performance of the adopted time-dependent techniques. However, the impact of this effect was found to be relatively modest. The findings of this study corroborate our position in favor of implementing techniques that capture the existing time dependence within historical data of software metrics, specifically in the context of this study, namely, Code Technical Debt. This necessitates the utilization of techniques that can effectively address this evidence.