Gradient-Free Neural Network Training on the Edge

📅 2024-10-13
🏛️ arXiv.org
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high energy consumption and reliance on gradient computation and high-precision floating-point arithmetic in neural network training on edge devices. We propose the first purely logic-gate-driven, gradient-free training paradigm. Under 1–2-bit weight constraints, our method identifies erroneous neurons via error attribution analysis and directly flips critical bits using bitwise logical operations (e.g., XOR, NOT) for parameter updates—completely eliminating backpropagation and full-precision intermediate computations. We prove that the underlying quantized optimization problem is NP-hard. Crucially, our approach achieves accuracy comparable to full-precision gradient-based training across multiple standard benchmarks—without introducing any hidden floating-point operations—while reducing computational overhead by over 90%. To our knowledge, this is the first method enabling end-to-end low-bit trainable neural networks on resource-constrained edge devices.

Technology Category

Application Category

📝 Abstract
Training neural networks is computationally heavy and energy-intensive. Many methodologies were developed to save computational requirements and energy by reducing the precision of network weights at inference time and introducing techniques such as rounding, stochastic rounding, and quantization. However, most of these techniques still require full gradient precision at training time, which makes training such models prohibitive on edge devices. This work presents a novel technique for training neural networks without needing gradients. This enables a training process where all the weights are one or two bits, without any hidden full precision computations. We show that it is possible to train models without gradient-based optimization techniques by identifying erroneous contributions of each neuron towards the expected classification and flipping the relevant bits using logical operations. We tested our method on several standard datasets and achieved performance comparable to corresponding gradient-based baselines with a fraction of the compute power.
Problem

Research questions and friction points this paper is trying to address.

Eliminating gradient computation in neural network training
Solving NP-hard optimization in quantized weight spaces
Reducing energy consumption while maintaining model performance
Innovation

Methods, ideas, or system contributions that make the work stand out.

Eliminates gradient-based optimization for neural networks
Introduces heuristic optimization avoiding full weight updates
Achieves comparable performance with reduced energy usage
🔎 Similar Papers
No similar papers found.
Dotan Di Castro
Dotan Di Castro
Research Manager at Bosch-AI, Haifa, Israel
Machine LearningReinforcement LearningRobotics
O
O. Joglekar
Bosch Centre for Artificial Intelligence
S
Shir Kozlovsky
Bosch Centre for Artificial Intelligence
Vladimir Tchuiev
Vladimir Tchuiev
Bosch Center for Artificial Intelligence
SLAMDeep LearningRobotics
M
Michal Moshkovitz
Bosch Centre for Artificial Intelligence