Stochastic Variational Propagation: Local, Scalable and Efficient Alternative to Backpropagation

📅 2025-05-08
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Backpropagation (BP) suffers from poor scalability and high memory overhead due to global gradient synchronization. This paper proposes Stochastic Variational Propagation (SVP), reformulating deep network training as hierarchical variational inference: layer-wise activations are treated as latent variables, and global BP is replaced by decentralized, local ELBO optimization—eliminating the need for gradient synchronization. SVP innovatively incorporates stochastic linear projections and inter-layer feature alignment losses to preserve global representational consistency while enforcing local independence, thereby effectively mitigating representation collapse. Experiments demonstrate that SVP achieves accuracy comparable to BP across MLPs, CNNs, and Transformers, reduces GPU memory consumption by up to 4×, and enables scalable, modular, and interpretable training.

Technology Category

Application Category

📝 Abstract
Backpropagation (BP) is the cornerstone of deep learning, but its reliance on global gradient synchronization limits scalability and imposes significant memory overhead. We propose Stochastic Variational Propagation (SVP), a scalable alternative that reframes training as hierarchical variational inference. SVP treats layer activations as latent variables and optimizes local Evidence Lower Bounds (ELBOs), enabling independent, local updates while preserving global coherence. However, directly applying KL divergence in layer-wise ELBOs risks inter-layer's representation collapse due to excessive compression. To prevent this, SVP projects activations into low-dimensional spaces via fixed random matrices, ensuring information preservation and representational diversity. Combined with a feature alignment loss for inter-layer consistency, SVP achieves competitive accuracy with BP across diverse architectures (MLPs, CNNs, Transformers) and datasets (MNIST to ImageNet), reduces memory usage by up to 4x, and significantly improves scalability. More broadly, SVP introduces a probabilistic perspective to deep representation learning, opening pathways toward more modular and interpretable neural network design.
Problem

Research questions and friction points this paper is trying to address.

Scalable alternative to backpropagation for deep learning
Prevents inter-layer representation collapse in variational training
Reduces memory usage while maintaining competitive accuracy
Innovation

Methods, ideas, or system contributions that make the work stand out.

Hierarchical variational inference for scalable training
Low-dimensional projection prevents representation collapse
Local updates with global coherence via ELBOs
🔎 Similar Papers
No similar papers found.