🤖 AI Summary
To address the degradation of model generalization caused by large-batch distributed training, this paper proposes a novel training paradigm integrating dual-batch learning with cyclic progressive learning. Methodologically, based on a parameter-server architecture, it concurrently employs large- and small-batch gradient updates to jointly optimize training efficiency and generalization; additionally, a cyclic progressive learning mechanism dynamically increases input image resolution during training to reduce early-stage computational overhead. The core innovation lies in the joint design of dual-batch collaborative optimization and resolution-adaptive scheduling. Experiments on ResNet-18 demonstrate consistent improvements: on CIFAR-100, top-1 accuracy increases by 3.3% while training time decreases by 10.6%; on ImageNet, accuracy improves by 0.1% and training time is reduced by 35.7%. These results confirm a synergistic gain in both efficiency and accuracy, validating the effectiveness of the proposed framework.
📝 Abstract
Distributed machine learning is critical for training deep learning models on large datasets and with numerous parameters. Current research primarily focuses on leveraging additional hardware resources and powerful computing units to accelerate the training process. As a result, larger batch sizes are often employed to speed up training. However, training with large batch sizes can lead to lower accuracy due to poor generalization. To address this issue, we propose the dual batch size learning scheme, a distributed training method built on the parameter server framework. This approach maximizes training efficiency by utilizing the largest batch size that the hardware can support while incorporating a smaller batch size to enhance model generalization. By using two different batch sizes simultaneously, this method reduces testing loss and enhances generalization, with minimal extra training time. Additionally, to mitigate the time overhead caused by dual batch size learning, we propose the cyclic progressive learning scheme. This technique gradually adjusts image resolution from low to high during training, significantly boosting training speed. By combining cyclic progressive learning with dual batch size learning, our hybrid approach improves both model generalization and training efficiency. Experimental results using ResNet-18 show that, compared to conventional training methods, our method can improve accuracy by 3.3% while reducing training time by 10.6% on CIFAR-100, and improve accuracy by 0.1% while reducing training time by 35.7% on ImageNet.