🤖 AI Summary
To address the high energy consumption, substantial memory overhead, and insufficient research on low-precision quantized training during backpropagation on edge devices, this paper proposes an INT8-quantized on-device training framework based on the Forward-Forward (FF) algorithm. We innovatively design a layer-wise quantization stabilization strategy and a “look-ahead” mechanism to significantly improve convergence and accuracy stability of FF under low-precision constraints. Furthermore, we integrate activation sharing and memory optimization techniques to reduce system resource utilization. Experiments on the Jetson Orin Nano platform demonstrate that, compared to the baseline, our method achieves a 4.6% speedup in training, an 8.3% reduction in energy consumption, and a 27.0% decrease in memory footprint—while maintaining model accuracy comparable to state-of-the-art methods. This work provides a practical pathway for efficient on-device training in resource-constrained environments.
📝 Abstract
Backpropagation has been the cornerstone of neural network training for decades, yet its inefficiencies in time and energy consumption limit its suitability for resource-constrained edge devices. While low-precision neural network quantization has been extensively researched to speed up model inference, its application in training has been less explored. Recently, the Forward-Forward (FF) algorithm has emerged as a promising alternative to backpropagation, replacing the backward pass with an additional forward pass. By avoiding the need to store intermediate activations for backpropagation, FF can reduce memory footprint, making it well-suited for embedded devices. This paper presents an INT8 quantized training approach that leverages FF's layer-by-layer strategy to stabilize gradient quantization. Furthermore, we propose a novel "look-ahead" scheme to address limitations of FF and improve model accuracy. Experiments conducted on NVIDIA Jetson Orin Nano board demonstrate 4.6% faster training, 8.3% energy savings, and 27.0% reduction in memory usage, while maintaining competitive accuracy compared to the state-of-the-art.