Self-Consistent Model-based Adaptation for Visual Reinforcement Learning

📅 2025-02-14

📈 Citations: 0

✨ Influential: 0

career value

234K/year

🤖 AI Summary

Visual reinforcement learning agents suffer from severe degradation in generalization performance when exposed to visual disturbances in real-world scenarios. To address this, we propose a plug-and-play, policy-agnostic adaptive framework centered on an unsupervised denoising adaptation mechanism grounded in self-consistency modeling: we theoretically establish the optimality of its distribution-matching objective and implicitly estimate the clean observation distribution via a pre-trained world model. The method integrates denoising generative models, world-model priors, unsupervised distribution matching optimization, and model-based adaptation policies. Evaluated across multiple visual generalization benchmarks and real-robot datasets, it significantly improves robustness and sample efficiency while remaining compatible with arbitrary policies—without requiring manual data augmentation design. This work presents the first approach achieving policy-free fine-tuning, purely unsupervised, world-model-based robustification against visual disturbances.

Technology Category

Application Category

📝 Abstract

Visual reinforcement learning agents typically face serious performance declines in real-world applications caused by visual distractions. Existing methods rely on fine-tuning the policy's representations with hand-crafted augmentations. In this work, we propose Self-Consistent Model-based Adaptation (SCMA), a novel method that fosters robust adaptation without modifying the policy. By transferring cluttered observations to clean ones with a denoising model, SCMA can mitigate distractions for various policies as a plug-and-play enhancement. To optimize the denoising model in an unsupervised manner, we derive an unsupervised distribution matching objective with a theoretical analysis of its optimality. We further present a practical algorithm to optimize the objective by estimating the distribution of clean observations with a pre-trained world model. Extensive experiments on multiple visual generalization benchmarks and real robot data demonstrate that SCMA effectively boosts performance across various distractions and exhibits better sample efficiency.

Problem

Research questions and friction points this paper is trying to address.

Enhance visual reinforcement learning robustness

Mitigate visual distractions without policy modification

Optimize denoising model with unsupervised distribution matching

Innovation

Methods, ideas, or system contributions that make the work stand out.

Self-Consistent Model-based Adaptation

Unsupervised distribution matching objective

Plug-and-play denoising model enhancement

🔎 Similar Papers

No similar papers found.