🤖 AI Summary
To address slow convergence, high computational overhead, and the difficulty of balancing accuracy and efficiency caused by input correlations in large-scale deep neural network (DNN) training, this paper proposes Decorrelated Backpropagation (DeCorrBP)—a lightweight, end-to-end input decorrelation method that requires no architectural modifications or additional parameters. Its core components include a gradient covariance suppression mechanism, a computationally efficient inter-layer whitening approximation, and seamless integration into standard training pipelines. Experiments on ResNets up to 50 layers demonstrate over 2× training speedup, improved test accuracy, and substantial reductions in GPU-hours and carbon footprint. To our knowledge, DeCorrBP is the first practical, stable, and efficient end-to-end input decorrelation method applicable to large-scale DNNs.
📝 Abstract
The backpropagation algorithm remains the dominant and most successful method for training deep neural networks (DNNs). At the same time, training DNNs at scale comes at a significant computational cost and therefore a high carbon footprint. Converging evidence suggests that input decorrelation may speed up deep learning. However, to date, this has not yet translated into substantial improvements in training efficiency in large-scale DNNs. This is mainly caused by the challenge of enforcing fast and stable network-wide decorrelation. Here, we show for the first time that much more efficient training of deep convolutional neural networks is feasible by embracing decorrelated backpropagation as a mechanism for learning. To achieve this goal we made use of a novel algorithm which induces network-wide input decorrelation using minimal computational overhead. By combining this algorithm with careful optimizations, we achieve a more than two-fold speed-up and higher test accuracy compared to backpropagation when training several deep networks up to a 50-layer ResNet model. This demonstrates that decorrelation provides exciting prospects for efficient deep learning at scale.