Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation

📅 2023-08-23
📈 Citations: 1
Influential: 0
📄 PDF
🤖 AI Summary
Training non-differentiable models—such as spiking neural networks (SNNs)—remains challenging due to the inapplicability of gradient-based optimization. To address this, we propose Layerwise Feedback Propagation (LFP), a gradient-free, greedy hierarchical optimization method. LFP decomposes task-level rewards layerwise via contribution attribution, enabling neuron-level direct optimization through reinforcement/inhibition mechanisms and sparse weight scaling. We provide theoretical convergence guarantees and demonstrate that LFP inherently yields hardware-efficient sparse models amenable to lossless deployment. Experiments show that LFP matches gradient descent in accuracy across diverse models and datasets; achieves significantly higher neuron pruning efficiency; and, for SNNs, enables training with zero approximation error—outperforming surrogate gradient methods in both accuracy and biological plausibility.
📝 Abstract
Gradient-based optimization has been a cornerstone of machine learning enabling the vast advances of AI development over the past decades. However, since this type of optimization requires differentiation, it reduces flexibility in the choice of model and objective. With recent evidence of the benefits of non-differentiable (e.g. neuromorphic) architectures over classical models, such constraints can become limiting in the future. We present Layer-wise Feedback Propagation (LFP), a novel training principle for neural network-like predictors utilizing methods from the domain of explainability to decompose a reward to individual neurons based on their respective contributions to solving a given task without imposing any differentiability requirements. Leveraging these neuron-wise rewards, our method then implements a greedy approach reinforcing helpful parts of the network and weakening harmful ones. While having comparable computational complexity to gradient descent, LFP offers the advantage that it obtains sparse models due to an implicit weight scaling. We establish the convergence of LFP theoretically and empirically, demonstrating its effectiveness on various models and datasets. We further investigate two applications for LFP: Firstly, neural network pruning, and secondly, the optimization of neuromorphic architectures such as Heaviside step function activated Spiking Neural Networks (SNNs). In the first setting, LFP naturally generates sparse models that are easily prunable and thus efficiently encode and compute information. In the second setting, LFP achieves comparable performance to surrogate gradient descent, but provides approximation-free training, which eases the implementation on neuromorphic hardware. Consequently, LFP combines efficiency in terms of computation and representation with flexibility w.r.t. model architecture and objective function. Our code is available.
Problem

Research questions and friction points this paper is trying to address.

Enables training without differentiability constraints
Improves flexibility in model and objective selection
Optimizes neuromorphic architectures for efficient computation
Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer-wise Feedback Propagation
No differentiability requirements
Sparse model generation
🔎 Similar Papers
No similar papers found.
Leander Weber
Leander Weber
Fraunhofer Heinrich-Hertz-Institut
Machine LearningExplainability
J
J. Berend
Technische Universität Berlin, 10587 Berlin, Germany
Alexander Binder
Alexander Binder
Professor, ScaDS.AI and Faculty of Math and Computer Science, Uni Leipzig, Germany
Explainable Deep LearningXAIAspects of Machine Learning
T
T. Wiegand
Fraunhofer Heinrich Hertz Institute, 10587 Berlin, Germany; Technische Universität Berlin, 10587 Berlin, Germany; BIFOLD – Berlin Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
W
W. Samek
Fraunhofer Heinrich Hertz Institute, 10587 Berlin, Germany; Technische Universität Berlin, 10587 Berlin, Germany; BIFOLD – Berlin Institute for the Foundations of Learning and Data, 10587 Berlin, Germany
S
S. Lapuschkin
Fraunhofer Heinrich Hertz Institute, 10587 Berlin, Germany