🤖 AI Summary
Visual reinforcement learning agents suffer from severe degradation in generalization performance when exposed to visual disturbances in real-world scenarios. To address this, we propose a plug-and-play, policy-agnostic adaptive framework centered on an unsupervised denoising adaptation mechanism grounded in self-consistency modeling: we theoretically establish the optimality of its distribution-matching objective and implicitly estimate the clean observation distribution via a pre-trained world model. The method integrates denoising generative models, world-model priors, unsupervised distribution matching optimization, and model-based adaptation policies. Evaluated across multiple visual generalization benchmarks and real-robot datasets, it significantly improves robustness and sample efficiency while remaining compatible with arbitrary policies—without requiring manual data augmentation design. This work presents the first approach achieving policy-free fine-tuning, purely unsupervised, world-model-based robustification against visual disturbances.
📝 Abstract
Visual reinforcement learning agents typically face serious performance declines in real-world applications caused by visual distractions. Existing methods rely on fine-tuning the policy's representations with hand-crafted augmentations. In this work, we propose Self-Consistent Model-based Adaptation (SCMA), a novel method that fosters robust adaptation without modifying the policy. By transferring cluttered observations to clean ones with a denoising model, SCMA can mitigate distractions for various policies as a plug-and-play enhancement. To optimize the denoising model in an unsupervised manner, we derive an unsupervised distribution matching objective with a theoretical analysis of its optimality. We further present a practical algorithm to optimize the objective by estimating the distribution of clean observations with a pre-trained world model. Extensive experiments on multiple visual generalization benchmarks and real robot data demonstrate that SCMA effectively boosts performance across various distractions and exhibits better sample efficiency.