Distributional Reinforcement Learning with Dual Expectile-Quantile Regression

📅 2023-05-26

🏛️ arXiv.org

📈 Citations: 1

✨ Influential: 1

career value

173K/year

🤖 AI Summary

In distributional reinforcement learning, Huber loss induces collapse of the return distribution to its mean, undermining distributional modeling capability. This paper proposes the first dual-regression framework jointly learning expectiles and quantiles, featuring an asymmetric L1/L2 hybrid loss, a dual-head distributional network, and an extended Implicit Quantile Network (IQN) architecture—enabling consistent distributional estimation under distributed temporal-difference updates. We theoretically prove that the method avoids distributional collapse and strictly preserves full distributional modeling capacity. Empirically, on the Atari benchmark, it achieves performance comparable to Huber-IQN-1 after 200M frames while maintaining stable, high-fidelity return distributions throughout training; toy experiments further confirm significantly improved distributional fidelity. The core contribution is the first integration of expectile regression into distributional RL, synergistically combined with quantile regression to jointly ensure robustness and distributional completeness.

📝 Abstract

Distributional reinforcement learning (RL) has proven useful in multiple benchmarks as it enables approximating the full distribution of returns and makes a better use of environment samples. The commonly used quantile regression approach to distributional RL -- based on asymmetric $L_1$ losses -- provides a flexible and effective way of learning arbitrary return distributions. In practice, it is often improved by using a more efficient, hybrid asymmetric $L_1$-$L_2$ Huber loss for quantile regression. However, by doing so, distributional estimation guarantees vanish, and we empirically observe that the estimated distribution rapidly collapses to its mean. Indeed, asymmetric $L_2$ losses, corresponding to expectile regression, cannot be readily used for distributional temporal difference learning. Motivated by the efficiency of $L_2$-based learning, we propose to jointly learn expectiles and quantiles of the return distribution in a way that allows efficient learning while keeping an estimate of the full distribution of returns. We prove that our approach approximately learns the correct return distribution, and we benchmark a practical implementation on a toy example and at scale. On the Atari benchmark, our approach matches the performance of the Huber-based IQN-1 baseline after $200$M training frames but avoids distributional collapse and keeps estimates of the full distribution of returns.

Problem

Research questions and friction points this paper is trying to address.

Improving distributional RL with dual expectile-quantile regression

Preventing distribution collapse in asymmetric L2-based learning

Maintaining full return distribution estimates efficiently

Innovation

Methods, ideas, or system contributions that make the work stand out.

Jointly learns expectiles and quantiles of returns

Combines L1 and L2 losses for efficiency

Prevents distributional collapse while learning

🔎 Similar Papers

The Benefits of Being Categorical Distributional: Uncertainty-aware Regularized Exploration in Reinforcement Learning