Efficient and Flexible Neural Network Training through Layer-wise Feedback Propagation

📅 2023-08-23

📈 Citations: 1

✨ Influential: 0

career value

253K/year

🤖 AI Summary

Training non-differentiable models—such as spiking neural networks (SNNs)—remains challenging due to the inapplicability of gradient-based optimization. To address this, we propose Layerwise Feedback Propagation (LFP), a gradient-free, greedy hierarchical optimization method. LFP decomposes task-level rewards layerwise via contribution attribution, enabling neuron-level direct optimization through reinforcement/inhibition mechanisms and sparse weight scaling. We provide theoretical convergence guarantees and demonstrate that LFP inherently yields hardware-efficient sparse models amenable to lossless deployment. Experiments show that LFP matches gradient descent in accuracy across diverse models and datasets; achieves significantly higher neuron pruning efficiency; and, for SNNs, enables training with zero approximation error—outperforming surrogate gradient methods in both accuracy and biological plausibility.

📝 Abstract

Gradient-based optimization has been a cornerstone of machine learning enabling the vast advances of AI development over the past decades. However, since this type of optimization requires differentiation, it reduces flexibility in the choice of model and objective. With recent evidence of the benefits of non-differentiable (e.g. neuromorphic) architectures over classical models, such constraints can become limiting in the future. We present Layer-wise Feedback Propagation (LFP), a novel training principle for neural network-like predictors utilizing methods from the domain of explainability to decompose a reward to individual neurons based on their respective contributions to solving a given task without imposing any differentiability requirements. Leveraging these neuron-wise rewards, our method then implements a greedy approach reinforcing helpful parts of the network and weakening harmful ones. While having comparable computational complexity to gradient descent, LFP offers the advantage that it obtains sparse models due to an implicit weight scaling. We establish the convergence of LFP theoretically and empirically, demonstrating its effectiveness on various models and datasets. We further investigate two applications for LFP: Firstly, neural network pruning, and secondly, the optimization of neuromorphic architectures such as Heaviside step function activated Spiking Neural Networks (SNNs). In the first setting, LFP naturally generates sparse models that are easily prunable and thus efficiently encode and compute information. In the second setting, LFP achieves comparable performance to surrogate gradient descent, but provides approximation-free training, which eases the implementation on neuromorphic hardware. Consequently, LFP combines efficiency in terms of computation and representation with flexibility w.r.t. model architecture and objective function. Our code is available.

Problem

Research questions and friction points this paper is trying to address.

Enables training without differentiability constraints

Improves flexibility in model and objective selection

Optimizes neuromorphic architectures for efficient computation

Innovation

Methods, ideas, or system contributions that make the work stand out.

Layer-wise Feedback Propagation

No differentiability requirements

Sparse model generation

🔎 Similar Papers

No similar papers found.