🤖 AI Summary
To address the challenge of configuring batch size adaptively in heterogeneous and dynamic distributed machine learning environments, this paper proposes a reinforcement learning (RL)-based adaptive optimization framework. It formulates batch size tuning as a sequential decision-making problem and introduces a multidimensional state representation—integrating network latency, resource utilization, and training convergence metrics—enabling policy generalization across diverse devices and model architectures without explicit system modeling. The framework employs the Proximal Policy Optimization (PPO) algorithm to make real-time batch size decisions during distributed training. Experiments across varied hardware and network conditions demonstrate up to a 6.3% improvement in model accuracy and a 46% reduction in training time; notably, it maintains superior performance even at scale (32 nodes). The core contribution lies in the first end-to-end RL formulation of batch size optimization, jointly optimizing training efficiency and model accuracy.
📝 Abstract
Existing batch size selection approaches in distributed machine learning rely on static allocation or simplistic heuristics that fail to adapt to heterogeneous, dynamic computing environments. We present DYNAMIX, a reinforcement learning framework that formulates batch size optimization as a sequential decision-making problem using Proximal Policy Optimization (PPO). Our approach employs a multi-dimensional state representation encompassing network-level metrics, system-level resource utilization, and training statistical efficiency indicators to enable informed decision-making across diverse computational resources. Our approach eliminates the need for explicit system modeling while integrating seamlessly with existing distributed training frameworks. Through evaluations across diverse workloads, hardware configurations, and network conditions, DYNAMIX achieves up to 6.3% improvement in the final model accuracy and 46% reduction in the total training time. Our scalability experiments demonstrate that DYNAMIX maintains the best performance as cluster size increases to 32 nodes, while policy transfer experiments show that learned policies generalize effectively across related model architectures.