🤖 AI Summary
In partially observable meta-reinforcement learning (meta-RL), existing approaches approximate Bayesian-optimal policies but struggle to learn compact, interpretable belief-state representations, limiting generalization and adaptation. This work introduces predictive coding—a neuroscientific principle—into meta-RL for the first time, proposing a self-supervised predictive coding module that jointly optimizes history compression and auxiliary prediction objectives to enable efficient representation learning over observation histories in state-machine environments. The method preserves policy performance while substantially improving both interpretability and Bayesian optimality of belief representations. On benchmark tasks such as active information seeking, it is the only approach achieving simultaneously optimal policies and optimal representations. Moreover, it demonstrates significantly enhanced cross-task generalization.
📝 Abstract
Learning a compact representation of history is critical for planning and generalization in partially observable environments. While meta-reinforcement learning (RL) agents can attain near Bayes-optimal policies, they often fail to learn the compact, interpretable Bayes-optimal belief states. This representational inefficiency potentially limits the agent's adaptability and generalization capacity. Inspired by predictive coding in neuroscience--which suggests that the brain predicts sensory inputs as a neural implementation of Bayesian inference--and by auxiliary predictive objectives in deep RL, we investigate whether integrating self-supervised predictive coding modules into meta-RL can facilitate learning of Bayes-optimal representations. Through state machine simulation, we show that meta-RL with predictive modules consistently generates more interpretable representations that better approximate Bayes-optimal belief states compared to conventional meta-RL across a wide variety of tasks, even when both achieve optimal policies. In challenging tasks requiring active information seeking, only meta-RL with predictive modules successfully learns optimal representations and policies, whereas conventional meta-RL struggles with inadequate representation learning. Finally, we demonstrate that better representation learning leads to improved generalization. Our results strongly suggest the role of predictive learning as a guiding principle for effective representation learning in agents navigating partial observability.