🤖 AI Summary
This work establishes the first theoretical connection between predictive coding (PC), a biologically plausible local learning rule, and the minimum description length (MDL) principle for joint optimization of empirical risk and model complexity. Methodologically, the authors show that deep PC training corresponds to block-coordinate descent on a two-part code objective function. They derive the first generalization bound for PC: $R( heta) leq hat{R}( heta) + L( heta)/N$, where the compression term $L( heta)$ arises from a prefix-code prior; prove that each PC update strictly reduces the two-part code length; and establish convergence guarantees to block-coordinate stationary points. Compared to unconstrained gradient descent, PC yields a tighter generalization bound and provably balances fidelity and parsimony. These results position PC as a theoretically grounded, biologically plausible alternative to backpropagation.
📝 Abstract
We present the first theoretical framework that connects predictive coding (PC), a biologically inspired local learning rule, with the minimum description length (MDL) principle in deep networks. We prove that layerwise PC performs block-coordinate descent on the MDL two-part code objective, thereby jointly minimizing empirical risk and model complexity. Using Hoeffding's inequality and a prefix-code prior, we derive a novel generalization bound of the form $R( heta) le ^{R}( heta) + frac{L( heta)}{N}$, capturing the tradeoff between fit and compression. We further prove that each PC sweep monotonically decreases the empirical two-part codelength, yielding tighter high-probability risk bounds than unconstrained gradient descent. Finally, we show that repeated PC updates converge to a block-coordinate stationary point, providing an approximate MDL-optimal solution. To our knowledge, this is the first result offering formal generalization and convergence guarantees for PC-trained deep models, positioning PC as a theoretically grounded and biologically plausible alternative to backpropagation.