🤖 AI Summary
This work addresses the significant performance degradation of visual reinforcement learning under dynamic visual perturbations, such as unpredictable switches in image degradation. The authors propose the ACO-MoE framework, which, for the first time, leverages information-theoretic analysis to reveal that reconstruction-based objectives inadvertently entangle perturbation artifacts with task-relevant representations. To mitigate this, they introduce a novel paradigm—decoupling perception from perturbation—by employing an agent-centric visual restoration module combined with a Mixture-of-Experts (MoE) mechanism to disentangle and remove perturbation information prior to reinforcement learning. Evaluated on the VDCS benchmark, the method recovers 95.3% of the performance achievable in clean environments and achieves state-of-the-art results on DMControl generalization tasks.
📝 Abstract
Visual reinforcement learning aims to empower an agent to learn policies from visual observations, yet it remains vulnerable to dynamic visual perturbations, such as unpredictable shifts in corruption types. To systematically study this, we introduce the Visual Degraded Control Suite (VDCS), a benchmark extending DeepMind Control Suite with Markov-switching degradations to simulate non-stationary real-world perturbations. Experiments on VDCS reveal severe performance degradation in existing methods. We theoretically prove via information-theoretic analysis that this failure stems from reconstruction-based objectives inevitably entangling perturbation artifacts into latent representations. To mitigate this negative impact, we propose Agent-Centric Observations with Mixture-of-Experts (ACO-MoE) to robustify visual RL against perturbations. The proposed framework leverages unique agent-centric restoration experts, achieving restoration from corruptions and task-relevant foreground extraction, thereby decoupling perception from perturbation before being processed by the RL agent. Extensive experiments on VDCS show our ACO-MoE outperforms strong baselines, recovering 95.3% of clean performance under challenging Markov-switching corruptions. Moreover, it achieves SOTA results on DMControl Generalization with random-color and video-background perturbations, demonstrating a high level of robustness.