Online Ensemble Transformer for Accurate Cloud Workload Forecasting in Predictive Auto-Scaling

📅 2025-08-18

📈 Citations: 0

✨ Influential: 0

career value

217K/year

🤖 AI Summary

To address the challenge of accurately forecasting highly dynamic and frequent workloads in cloud environments, this paper proposes E3Former, an online ensemble Transformer model. E3Former introduces a novel predictive framework that integrates a dynamic online ensemble architecture, lightweight Transformer subnetworks, and an incremental online learning mechanism. The design simultaneously achieves high prediction accuracy and low computational overhead, effectively capturing multi-scale periodic patterns while rapidly adapting to concept drift in streaming workload data. Evaluated on real-world cloud workload traces, E3Former reduces average prediction error by 10% compared to state-of-the-art baselines. Deployed in ByteDance’s Intelligent Hybrid Predictive Autoscaling (IHPA) platform, it supports predictive autoscaling for over 30 production applications across 600,000 CPU cores, improving resource utilization by more than 40% and significantly enhancing system stability and scheduling efficiency.

Technology Category

Application Category

📝 Abstract

In the swiftly evolving domain of cloud computing, the advent of serverless systems underscores the crucial need for predictive auto-scaling systems. This necessity arises to ensure optimal resource allocation and maintain operational efficiency in inherently volatile environments. At the core of a predictive auto-scaling system is the workload forecasting model. Existing forecasting models struggle to quickly adapt to the dynamics in online workload streams and have difficulty capturing the complex periodicity brought by fine-grained, high-frequency forecasting tasks. Addressing this, we propose a novel online ensemble model, E3Former, for online workload forecasting in large-scale predictive auto-scaling. Our model synergizes the predictive capabilities of multiple subnetworks to surmount the limitations of single-model approaches, thus ensuring superior accuracy and robustness. Remarkably, it accomplishes this with a minimal increase in computational overhead, adhering to the lean operational ethos of serverless systems. Through extensive experimentation on real-world workload datasets, we establish the efficacy of our ensemble model. In online forecasting tasks, the proposed method reduces forecast error by an average of 10%, and its effectiveness is further demonstrated through a predictive auto-scaling test in the real-life online system. Currently, our method has been deployed within ByteDance's Intelligent Horizontal Pod Auto-scaling (IHPA) platform, which supports the stable operation of over 30 applications, such as Douyin E-Comerce, TouTiao, and Volcano Engine. The predictive auto-scaling capacity reaching over 600,000 CPU cores. On the basis of essentially ensuring service quality, the predictive auto-scaling system can reduce resource utilization by over 40%.

Problem

Research questions and friction points this paper is trying to address.

Adapting to dynamic online workload streams efficiently

Capturing complex periodicity in high-frequency forecasting tasks

Ensuring accurate resource allocation in volatile cloud environments

Innovation

Methods, ideas, or system contributions that make the work stand out.

Online ensemble model for workload forecasting

Combines multiple subnetworks for accuracy

Minimal computational overhead increase

🔎 Similar Papers

No similar papers found.