🤖 AI Summary
Low-precision neural network training suffers from a fundamental trade-off between stability and accuracy. Method: Inspired by multiplicative synaptic noise dynamics in biological systems, we propose a Bayesian learning framework grounded in a log-normal posterior assumption. We introduce, for the first time, multiplicative dynamics into artificial neural network training—combining multiplicative noise with implicit regularization in parameter updates—requiring only one additional vector of parameters, thus minimizing memory overhead. The method supports fully low-precision forward passes and is compatible with mainstream large architectures, including ViT and GPT-2. Contribution/Results: Experiments demonstrate stable, from-scratch training under purely low-precision forward computation; enhanced learning and inference robustness on energy-efficient hardware; and final accuracies matching those achieved by Adam. This work establishes a novel paradigm for efficient Bayesian learning tailored to edge AI.
📝 Abstract
Studies in neuroscience have shown that biological synapses follow a log-normal distribution whose transitioning can be explained by noisy multiplicative dynamics. Biological networks can function stably even under dynamically fluctuating conditions arising due to unreliable synaptic transmissions. Here we ask: Is it possible to design similar multiplicative training in artificial neural networks? To answer this question, we derive a Bayesian learning rule that assumes log-normal posterior distributions over weights which gives rise to a new Log-Normal Multiplicative Dynamics (LMD) algorithm. The algorithm uses multiplicative updates with both noise and regularization applied multiplicatively. The method is as easy to implement as Adam and only requires one additional vector to store. Our results show that LMD achieves stable and accurate training-from-scratch under low-precision forward operations for Vision Transformer and GPT-2. These results suggest that multiplicative dynamics, a biological feature, may enable stable low-precision inference and learning on future energy-efficient hardware.