Saliency-Guided Representation with Consistency Policy Learning for Visual Unsupervised Reinforcement Learning

📅 2026-04-07

📈 Citations: 0

✨ Influential: 0

career value

215K/year

🤖 AI Summary

Existing successor representation (SR) methods struggle to effectively model dynamics-relevant information in high-dimensional visual settings under unsupervised reinforcement learning, leading to biased representations, poor generalization, and limited skill controllability. This work proposes the SRCP framework, which for the first time integrates saliency-guided disentangled task representation with consistency-based policy learning. By incorporating saliency-guided modeling, disentangled SR training, and a classifier-free guidance mechanism, SRCP effectively mitigates the challenges of multimodal policy modeling. Experiments across four datasets and 16 tasks in the ExORL benchmark demonstrate that the proposed method significantly improves zero-shot generalization performance, is compatible with various SR algorithms, and achieves state-of-the-art results.

Technology Category

Application Category

📝 Abstract

Zero-shot unsupervised reinforcement learning (URL) offers a promising direction for building generalist agents capable of generalizing to unseen tasks without additional supervision. Among existing approaches, successor representations (SR) have emerged as a prominent paradigm due to their effectiveness in structured, low-dimensional settings. However, SR methods struggle to scale to high-dimensional visual environments. Through empirical analysis, we identify two key limitations of SR in visual URL: (1) SR objectives often lead to suboptimal representations that attend to dynamics-irrelevant regions, resulting in inaccurate successor measures and degraded task generalization; and (2) these flawed representations hinder SR policies from modeling multi-modal skill-conditioned action distributions and ensuring skill controllability. To address these limitations, we propose Saliency-Guided Representation with Consistency Policy Learning (SRCP), a novel framework that improves zero-shot generalization of SR methods in visual URL. SRCP decouples representation learning from successor training by introducing a saliency-guided dynamics task to capture dynamics-relevant representations, thereby improving successor measure and task generalization. Moreover, it integrates a fast-sampling consistency policy with URL-specific classifier-free guidance and tailored training objectives to improve skill-conditioned policy modeling and controllability. Extensive experiments on 16 tasks across 4 datasets from the ExORL benchmark demonstrate that SRCP achieves state-of-the-art zero-shot generalization in visual URL and is compatible with various SR methods.

Problem

Research questions and friction points this paper is trying to address.

successor representations

visual unsupervised reinforcement learning

zero-shot generalization

representation learning

skill controllability

Innovation

Methods, ideas, or system contributions that make the work stand out.

saliency-guided representation

consistency policy learning

successor representations