Stochastic Variational Propagation: Local, Scalable and Efficient Alternative to Backpropagation

📅 2025-05-08

📈 Citations: 0

✨ Influential: 0

career value

201K/year

🤖 AI Summary

Backpropagation (BP) suffers from poor scalability and high memory overhead due to global gradient synchronization. This paper proposes Stochastic Variational Propagation (SVP), reformulating deep network training as hierarchical variational inference: layer-wise activations are treated as latent variables, and global BP is replaced by decentralized, local ELBO optimization—eliminating the need for gradient synchronization. SVP innovatively incorporates stochastic linear projections and inter-layer feature alignment losses to preserve global representational consistency while enforcing local independence, thereby effectively mitigating representation collapse. Experiments demonstrate that SVP achieves accuracy comparable to BP across MLPs, CNNs, and Transformers, reduces GPU memory consumption by up to 4×, and enables scalable, modular, and interpretable training.

Technology Category

Application Category

📝 Abstract

Backpropagation (BP) is the cornerstone of deep learning, but its reliance on global gradient synchronization limits scalability and imposes significant memory overhead. We propose Stochastic Variational Propagation (SVP), a scalable alternative that reframes training as hierarchical variational inference. SVP treats layer activations as latent variables and optimizes local Evidence Lower Bounds (ELBOs), enabling independent, local updates while preserving global coherence. However, directly applying KL divergence in layer-wise ELBOs risks inter-layer's representation collapse due to excessive compression. To prevent this, SVP projects activations into low-dimensional spaces via fixed random matrices, ensuring information preservation and representational diversity. Combined with a feature alignment loss for inter-layer consistency, SVP achieves competitive accuracy with BP across diverse architectures (MLPs, CNNs, Transformers) and datasets (MNIST to ImageNet), reduces memory usage by up to 4x, and significantly improves scalability. More broadly, SVP introduces a probabilistic perspective to deep representation learning, opening pathways toward more modular and interpretable neural network design.

Problem

Research questions and friction points this paper is trying to address.

Scalable alternative to backpropagation for deep learning

Prevents inter-layer representation collapse in variational training

Reduces memory usage while maintaining competitive accuracy

Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical variational inference for scalable training

Low-dimensional projection prevents representation collapse

Local updates with global coherence via ELBOs

🔎 Similar Papers

Variational Stochastic Gradient Descent for Deep Neural Networks