🤖 AI Summary
To address the poor adaptability of conventional scheduling methods under dynamic, heterogeneous workloads in cloud computing—and their inability to simultaneously optimize efficiency and cost—this paper presents the first systematic survey and empirical evaluation of deep reinforcement learning (DRL) paradigms for cloud job scheduling and resource management. We propose a multidimensional performance comparison framework assessing scalability, generalization, and online adaptability. Integrating mainstream DRL approaches—including DQN, PPO/A3C, graph neural networks (GNNs), and hierarchical RL—we evaluate 87 algorithms on the CloudSim++/iCan simulation platform. Experimental results demonstrate that the best-performing methods reduce energy consumption by 12–35%, decrease job completion time by 18–42%, and suppress SLA violation rates to below 1.3%. This work establishes both theoretical foundations and practical guidelines for DRL-driven intelligent cloud resource management.
📝 Abstract
Cloud computing has revolutionized the provisioning of computing resources, offering scalable, flexible, and on-demand services to meet the diverse requirements of modern applications. At the heart of efficient cloud operations are job scheduling and resource management, which are critical for optimizing system performance and ensuring timely and cost-effective service delivery. However, the dynamic and heterogeneous nature of cloud environments presents significant challenges for these tasks, as workloads and resource availability can fluctuate unpredictably. Traditional approaches, including heuristic and meta-heuristic algorithms, often struggle to adapt to these real-time changes due to their reliance on static models or predefined rules. Deep Reinforcement Learning (DRL) has emerged as a promising solution to these challenges by enabling systems to learn and adapt policies based on continuous observations of the environment, facilitating intelligent and responsive decision-making. This survey provides a comprehensive review of DRL-based algorithms for job scheduling and resource management in cloud computing, analyzing their methodologies, performance metrics, and practical applications. We also highlight emerging trends and future research directions, offering valuable insights into leveraging DRL to advance both job scheduling and resource management in cloud computing.