🤖 AI Summary
To address the high computational overhead and poor scalability caused by frequent retraining of ML-based capacity prediction models in capacity management, this paper proposes an on-demand retraining mechanism driven by data drift detection. The method integrates time-series analysis, lightweight ML forecasting models, and multi-dimensional data drift detection within an AIOps framework to enable dynamic capacity prediction. Its core innovation lies in triggering retraining only upon statistically significant data drift—eliminating redundant periodic updates. Experiments show that, under most dynamic workloads, its prediction accuracy matches that of periodic retraining (error difference <3%), while reducing average computational cost by 62%. Only under extremely high-frequency workload shifts is periodic retraining recommended as a fallback. This work establishes a new paradigm for resource demand forecasting that jointly optimizes accuracy, efficiency, and adaptability.
📝 Abstract
Capacity management is critical for software organizations to allocate resources effectively and meet operational demands. An important step in capacity management is predicting future resource needs often relies on data-driven analytics and machine learning (ML) forecasting models, which require frequent retraining to stay relevant as data evolves. Continuously retraining the forecasting models can be expensive and difficult to scale, posing a challenge for engineering teams tasked with balancing accuracy and efficiency. Retraining only when the data changes appears to be a more computationally efficient alternative, but its impact on accuracy requires further investigation. In this work, we investigate the effects of retraining capacity forecasting models for time series based on detected changes in the data compared to periodic retraining. Our results show that drift-based retraining achieves comparable forecasting accuracy to periodic retraining in most cases, making it a cost-effective strategy. However, in cases where data is changing rapidly, periodic retraining is still preferred to maximize the forecasting accuracy. These findings offer actionable insights for software teams to enhance forecasting systems, reducing retraining overhead while maintaining robust performance.