Debiased Model-based Representations for Sample-efficient Continuous Control

📅 2026-05-12

📈 Citations: 0

✨ Influential: 0

career value

188K/year

🤖 AI Summary

This work addresses the tendency of existing model-based representation methods to overfit to early experiences in the replay buffer, which induces representation bias and degrades policy learning performance. To mitigate this issue, the paper proposes DR.Q, an algorithm that, for the first time, integrates mutual information maximization with bias correction into model-based representation learning. Specifically, DR.Q explicitly maximizes the mutual information between the current state-action pair and the representation of the next state, while introducing a decaying prioritized experience replay mechanism to alleviate overfitting to initial data. Evaluated on standard continuous control benchmarks, DR.Q matches or surpasses state-of-the-art baselines using a single hyperparameter configuration, achieving notably superior performance on several tasks.

📝 Abstract

Model-based representations recently stand out as a promising framework that embeds latent dynamics information into the representations for downstream off-policy actor-critic learning. It implicitly combines the advantages of both model-free and model-based approaches while avoiding the training costs associated with model-based methods. Nevertheless, existing model-based representation methods can fail to capture sufficient information about relevant variables and can overfit to early experiences in the replay buffer. These incur biases in representation and actor-critic learning, leading to inferior performance. To address this, we propose Debiased model-based Representations for Q-learning, tagged DR.Q algorithm. DR.Q explicitly maximizes the mutual information between the representations of the current state-action pair and the next state besides minimizing their deviations, and samples transitions with faded prioritized experience replay. We evaluate DR.Q on numerous continuous control benchmarks with a single set of hyperparameters, and the results demonstrate that DR.Q can match or surpass recent strong baselines, sometimes outperforming them by a large margin. Our code is available at https://github.com/dmksjfl/DR.Q.

Problem

Research questions and friction points this paper is trying to address.

model-based representations

representation bias

overfitting

continuous control

off-policy learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

debiased representation

mutual information maximization

model-based representation