π€ AI Summary
To address high gradient estimation variance, excessive query overhead, and poor scalability in forward learning, this paper formulates intra-batch query resource allocation as a constrained variance minimization problemβthe first such formulation in the literature. We propose a lightweight, plug-and-play, theoretically optimal intra-batch adaptive query allocator that requires no modification to existing forward learning frameworks. Our method enables efficient optimization via surrogate objective simplification, reparameterization, and Monte Carlo variance analysis. Empirical evaluation on ViT fine-tuning, prompt tuning, and multimodal alignment demonstrates up to 72% reduction in query count while preserving or improving model performance. This significantly enhances the practical scalability of forward learning algorithms. The core contribution lies in rigorously recasting query scheduling as a theoretically grounded, optimal resource allocation problem and delivering an engineering-ready, modular solution.
π Abstract
Given the limitations of backpropagation, perturbation-based gradient computation methods have recently gained focus for learning with only forward passes, also referred to as queries. Conventional forward learning consumes enormous queries on each data point for accurate gradient estimation through Monte Carlo sampling, which hinders the scalability of those algorithms. However, not all data points deserve equal queries for gradient estimation. In this paper, we study the problem of improving the forward learning efficiency from a novel perspective: how to reduce the gradient estimation variance with minimum cost? For this, we propose to allocate the optimal number of queries over each data in one batch during training to achieve a good balance between estimation accuracy and computational efficiency. Specifically, with a simplified proxy objective and a reparameterization technique, we derive a novel plug-and-play query allocator with minimal parameters. Theoretical results are carried out to verify its optimality. We conduct extensive experiments for fine-tuning Vision Transformers on various datasets and further deploy the allocator to two black-box applications: prompt tuning and multimodal alignment for foundation models. All findings demonstrate that our proposed allocator significantly enhances the scalability of forward-learning algorithms, paving the way for real-world applications.