🤖 AI Summary
Current self-supervised learning relies on biologically implausible backpropagation, while predictive coding approaches either require high-dimensional generative modeling or introduce人工 annotations—both violating the core principle of unsupervised learning. To address this, we propose Meta-Representational Predictive Coding (MPC): a framework that abandons pixel-level reconstruction and external supervision, instead predicting high-level representations across multimodal sensory pathways. Grounded in the free-energy principle and active inference, MPC integrates predictive coding, cross-pathway representational interaction, and saccade-driven dynamic sensory sampling within a pure encoder architecture, enabling end-to-end self-supervised learning. Crucially, MPC is fully neurobiologically plausible—mimicking hierarchical cortical processing—while maintaining computational efficiency. Empirically, it achieves significantly improved representation discriminability and cross-task generalization, advancing both theoretical grounding and practical performance in self-supervised representation learning.
📝 Abstract
Self-supervised learning has become an increasingly important paradigm in the domain of machine intelligence. Furthermore, evidence for self-supervised adaptation, such as contrastive formulations, has emerged in recent computational neuroscience and brain-inspired research. Nevertheless, current work on self-supervised learning relies on biologically implausible credit assignment -- in the form of backpropagation of errors -- and feedforward inference, typically a forward-locked pass. Predictive coding, in its mechanistic form, offers a biologically plausible means to sidestep these backprop-specific limitations. However, unsupervised predictive coding rests on learning a generative model of raw pixel input (akin to ``generative AI'' approaches), which entails predicting a potentially high dimensional input; on the other hand, supervised predictive coding, which learns a mapping between inputs to target labels, requires human annotation, and thus incurs the drawbacks of supervised learning. In this work, we present a scheme for self-supervised learning within a neurobiologically plausible framework that appeals to the free energy principle, constructing a new form of predictive coding that we call meta-representational predictive coding (MPC). MPC sidesteps the need for learning a generative model of sensory input (e.g., pixel-level features) by learning to predict representations of sensory input across parallel streams, resulting in an encoder-only learning and inference scheme. This formulation rests on active inference (in the form of sensory glimpsing) to drive the learning of representations, i.e., the representational dynamics are driven by sequences of decisions made by the model to sample informative portions of its sensorium.