🤖 AI Summary
This work investigates whether backpropagation is sample-efficiently optimal and proposes synthetic gradients as a viable alternative. By constructing a unified vectorized feedback framework that jointly models loss-based and reward-based learning within a single computational graph, the study establishes, for the first time, sufficient theoretical conditions under which synthetic gradients provably surpass backpropagation in sample efficiency—advantages that can be made arbitrarily large. The analysis centers on the mean squared error properties of gradient estimators and demonstrates, through contextual bandit and reinforcement learning tasks, the substantial potential of synthetic gradients to enhance sample efficiency.
📝 Abstract
Backpropagation is the default learning rule for artificial neural networks and is often treated as the settled approach whenever differentiability is available. In this work, we revisit this convention through a theoretical lens of sample efficiency. We introduce a unified vectorized feedback framework for loss-based and reward-based learning on computational graphs, in which synthetic gradients emerge as a natural alternative to backpropagation. We characterize the conditions under which synthetic gradients can achieve a lower gradient-estimation mean squared error than backpropagation. We construct examples illustrating that this sample efficiency advantage can be arbitrarily large. Experiments on contextual bandits and reinforcement learning tasks demonstrate the potential of our theoretical findings.