Self-Consistent Model-based Adaptation for Visual Reinforcement Learning

📅 2025-02-14
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Visual reinforcement learning agents suffer from severe degradation in generalization performance when exposed to visual disturbances in real-world scenarios. To address this, we propose a plug-and-play, policy-agnostic adaptive framework centered on an unsupervised denoising adaptation mechanism grounded in self-consistency modeling: we theoretically establish the optimality of its distribution-matching objective and implicitly estimate the clean observation distribution via a pre-trained world model. The method integrates denoising generative models, world-model priors, unsupervised distribution matching optimization, and model-based adaptation policies. Evaluated across multiple visual generalization benchmarks and real-robot datasets, it significantly improves robustness and sample efficiency while remaining compatible with arbitrary policies—without requiring manual data augmentation design. This work presents the first approach achieving policy-free fine-tuning, purely unsupervised, world-model-based robustification against visual disturbances.

Technology Category

Application Category

📝 Abstract
Visual reinforcement learning agents typically face serious performance declines in real-world applications caused by visual distractions. Existing methods rely on fine-tuning the policy's representations with hand-crafted augmentations. In this work, we propose Self-Consistent Model-based Adaptation (SCMA), a novel method that fosters robust adaptation without modifying the policy. By transferring cluttered observations to clean ones with a denoising model, SCMA can mitigate distractions for various policies as a plug-and-play enhancement. To optimize the denoising model in an unsupervised manner, we derive an unsupervised distribution matching objective with a theoretical analysis of its optimality. We further present a practical algorithm to optimize the objective by estimating the distribution of clean observations with a pre-trained world model. Extensive experiments on multiple visual generalization benchmarks and real robot data demonstrate that SCMA effectively boosts performance across various distractions and exhibits better sample efficiency.
Problem

Research questions and friction points this paper is trying to address.

Enhance visual reinforcement learning robustness
Mitigate visual distractions without policy modification
Optimize denoising model with unsupervised distribution matching
Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Consistent Model-based Adaptation
Unsupervised distribution matching objective
Plug-and-play denoising model enhancement
🔎 Similar Papers
No similar papers found.
X
Xinning Zhou
Department of Computer Science & Technology, Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University
Chengyang Ying
Chengyang Ying
Tsinghua university
Machine LearningReinforcement LearningEmbodied AI
Y
Yao Feng
Department of Computer Science & Technology, Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University
H
Hang Su
Department of Computer Science & Technology, Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University
J
Jun Zhu
Department of Computer Science & Technology, Institute for AI, BNRist Center, Tsinghua-Bosch Joint ML Center, THBI Lab, Tsinghua University