🤖 AI Summary
In model-based reinforcement learning (MBRL) with visual inputs, task-irrelevant background dynamics induce an information conflict between representation learning and latent state modeling, resulting in representational redundancy and poor policy robustness. This work—first from an information-theoretic perspective—identifies the root cause of this conflict and proposes a negative-sample-free contrastive learning framework coupled with a temporal adaptive loss reweighting mechanism to jointly optimize representation invariance and dynamics modeling accuracy. Furthermore, it integrates an information bottleneck constraint within a unified training paradigm that jointly optimizes observation reconstruction and latent dynamics prediction. Evaluated on robotic control tasks under diverse dynamic background disturbances, the method substantially outperforms state-of-the-art approaches: it learns compact, background-noise-robust representations, improves policy stability by 23.6%, and enhances sample efficiency by 18.4%.
📝 Abstract
Existing visual model-based reinforcement learning (MBRL) algorithms with observation reconstruction often suffer from information conflicts, making it difficult to learn compact representations and hence result in less robust policies, especially in the presence of task-irrelevant visual distractions. In this paper, we first reveal that the information conflicts in current visual MBRL algorithms stem from visual representation learning and latent dynamics modeling with an information-theoretic perspective. Based on this finding, we present a new algorithm to resolve information conflicts for visual MBRL, named MInCo, which mitigates information conflicts by leveraging negative-free contrastive learning, aiding in learning invariant representation and robust policies despite noisy observations. To prevent the dominance of visual representation learning, we introduce time-varying reweighting to bias the learning towards dynamics modeling as training proceeds. We evaluate our method on several robotic control tasks with dynamic background distractions. Our experiments demonstrate that MInCo learns invariant representations against background noise and consistently outperforms current state-of-the-art visual MBRL methods. Code is available at https://github.com/ShiguangSun/minco.