🤖 AI Summary
This work addresses the tendency of existing model-based representation methods to overfit to early experiences in the replay buffer, which induces representation bias and degrades policy learning performance. To mitigate this issue, the paper proposes DR.Q, an algorithm that, for the first time, integrates mutual information maximization with bias correction into model-based representation learning. Specifically, DR.Q explicitly maximizes the mutual information between the current state-action pair and the representation of the next state, while introducing a decaying prioritized experience replay mechanism to alleviate overfitting to initial data. Evaluated on standard continuous control benchmarks, DR.Q matches or surpasses state-of-the-art baselines using a single hyperparameter configuration, achieving notably superior performance on several tasks.
📝 Abstract
Model-based representations recently stand out as a promising framework that embeds latent dynamics information into the representations for downstream off-policy actor-critic learning. It implicitly combines the advantages of both model-free and model-based approaches while avoiding the training costs associated with model-based methods. Nevertheless, existing model-based representation methods can fail to capture sufficient information about relevant variables and can overfit to early experiences in the replay buffer. These incur biases in representation and actor-critic learning, leading to inferior performance. To address this, we propose Debiased model-based Representations for Q-learning, tagged DR.Q algorithm. DR.Q explicitly maximizes the mutual information between the representations of the current state-action pair and the next state besides minimizing their deviations, and samples transitions with faded prioritized experience replay. We evaluate DR.Q on numerous continuous control benchmarks with a single set of hyperparameters, and the results demonstrate that DR.Q can match or surpass recent strong baselines, sometimes outperforming them by a large margin. Our code is available at https://github.com/dmksjfl/DR.Q.