🤖 AI Summary
In cloud-native databases, highly dynamic resource states and complex scheduling lead to low coordination efficiency and unstable policies. To address this, we propose an adaptive resource orchestration method based on multi-agent reinforcement learning (MARL). Our approach features: (1) a heterogeneous role-based agent architecture, where compute, storage, and other resource entities are endowed with distinct, specialized policy capabilities; and (2) a reward shaping mechanism that jointly leverages local observations and global feedback to mitigate policy bias arising from partial observability and improve convergence stability. Evaluated on real production workloads, our method achieves significant improvements: +18.3% in resource utilization, −32.7% in average scheduling latency, 2.1× faster policy convergence, and enhanced system stability, fairness, and cross-workload generalization.
📝 Abstract
This paper addresses the challenges of high resource dynamism and scheduling complexity in cloud-native database systems. It proposes an adaptive resource orchestration method based on multi-agent reinforcement learning. The method introduces a heterogeneous role-based agent modeling mechanism. This allows different resource entities, such as compute nodes, storage nodes, and schedulers, to adopt distinct policy representations. These agents are better able to reflect diverse functional responsibilities and local environmental characteristics within the system. A reward-shaping mechanism is designed to integrate local observations with global feedback. This helps mitigate policy learning bias caused by incomplete state observations. By combining real-time local performance signals with global system value estimation, the mechanism improves coordination among agents and enhances policy convergence stability. A unified multi-agent training framework is developed and evaluated on a representative production scheduling dataset. Experimental results show that the proposed method outperforms traditional approaches across multiple key metrics. These include resource utilization, scheduling latency, policy convergence speed, system stability, and fairness. The results demonstrate strong generalization and practical utility. Across various experimental scenarios, the method proves effective in handling orchestration tasks with high concurrency, high-dimensional state spaces, and complex dependency relationships. This confirms its advantages in real-world, large-scale scheduling environments.