π€ AI Summary
This work addresses the distortion in distributional estimates caused by bootstrapped targets in quantile-based reinforcement learning. Reinterpreting the IQN loss as a local empirical quantile estimation problem, the paper introduces Wasserstein distributionally robust optimization to derive closed-form, quantile-level-dependent Bellman target corrections for each quantile slot. This yields a lightweight geometric regularization that requires neither modification of value targets nor resampling. By enforcing median antisymmetry and monotonicity constraints, the method widens the spread between upper and lower quantiles while preserving the risk-neutral mean, thereby effectively mitigating distributional collapse. Empirical results demonstrate superior performance over existing quantile reinforcement learning approaches on both risk-sensitive navigation tasks and Atari benchmarks.
π Abstract
Quantile-based distributional reinforcement learning methods learn return distributions through sampled quantile regression, but their bootstrapped target quantiles may induce distorted or degenerate distribution estimates. We propose Robust Quantile-based Implicit Quantile Networks (RQIQN), a lightweight Wasserstein distributionally robust enhancement boosted from a quantile estimation perspective. We first reinterpret a snapshot of IQN loss as a collection of local empirical quantile estimation problems over sampled current fractions. We then robustify each local slot with a Wasserstein distributionally robust quantile estimation formulation, yielding a closed-form, fraction-dependent correction to the Bellman target. This correction directly addresses distributional degeneration: its median antisymmetry preserves the risk-neutral quantile average, while its monotonicity enlarges upper-lower quantile gaps and counteracts collapsed distributional spread. RQIQN thus regularizes quantile geometry without changing the underlying value objective or requiring additional sample set reconstruction. Finally, we empirically show that the proposed RQIQN outperforms other existing quantile-based distributional reinforcement learning algorithms in risk-sensitive navigation and Atari games.