🤖 AI Summary
This work proposes a hybrid learning paradigm for visual representation learning that eschews end-to-end backpropagation by integrating structured priors with local plasticity. The approach employs a modular hierarchical architecture combining fixed multi-frequency Gabor filters, intra-stream Hebbian/Oja and anti-Hebbian learning rules, modern Hopfield associative memory, and top-down iterative modulation mechanisms. Gradient-based optimization is confined solely to the linear readout layer and feedback projections, drastically reducing reliance on global backpropagation. Evaluated via linear probing, the model achieves 80.1% ± 0.3% accuracy on CIFAR-10 and 54.8% on CIFAR-100, substantially outperforming purely Hebbian baselines and approaching the performance of fully gradient-trained models.
📝 Abstract
We study how far structured architectural bias can compensate for the absence of end-to-end gradient-based representation learning in visual recognition. Building on the VisNet tradition, we introduce a modular hierarchical framework combining: (i) fixed multi-frequency Gabor decomposition into F=7 parallel streams; (ii) within-stream competitive learning with Hebbian and Oja updates and anti-Hebbian decorrelation; (iii) an associative memory module inspired by modern Hopfield retrieval; and (iv) iterative top-down modulation using local prediction and reconstruction signals.
Representational layers are trained without end-to-end backpropagation through the full hierarchy; only the final linear readout and top-down projection matrices are optimized by gradient descent. We therefore interpret the model as a hybrid system that is predominantly locally trained but includes a small number of gradient-trained parameters.
On CIFAR-10, the full model reaches 80.1% +/- 0.3% top-1 accuracy, linear probe), compared with 71.0% for a Hebbian-only baseline and 83.4% for a gradient-trained model on the same fixed Gabor basis. On CIFAR-100, performance is 54.8%. Factorial analysis indicates that multi-frequency streams, associative memory, and top-down feedback contribute largely additively, with a significant Streams x TopDown interaction (p=0.02).
These results suggest that carefully chosen architectural priors can recover a substantial fraction of the performance typically associated with global gradient training, while leaving a measurable residual gap. Experiments are limited to CIFAR-10/100.