π€ AI Summary
This work addresses the challenge of unstable training in reinforcement learning within chaotic dynamical systems, where exponential sensitivity to initial conditions induces high variance in bootstrapped targets and ill-conditioned gradient updates. To mitigate this, the authors propose a distributional reinforcement learning approach that models the return distribution under the 1-Wasserstein metric. Leveraging the observation that the evolution of return distributions is smoother than individual trajectories, they formulate a well-conditioned Bellman optimization objective. The study provides the first theoretical insight from the perspective of measure evolution, demonstrating that return distributions exhibit more regular dynamical structure even in chaotic regimes. This analysis elucidates the geometric properties of the resulting objective function and offers principled justification for enhanced learning stability in chaotic environments.
π Abstract
Chaotic dynamical systems pose a fundamental challenge for Reinforcement Learning (RL): exponential sensitivity to initial conditions induces high-variance bootstrap targets and poorly conditioned gradient updates. Chaotic dynamics arise across scientific and engineering domains, from fluid flows and climate systems to multi-agent systems, where reliable learning is highly desirable. Standard RL methods optimise expected returns through scalar value functions, implicitly averaging over diverging trajectories and entangling trajectory level instability with the learning objective. We show that under mild statistical stability assumptions, the return distribution evolves more regularly than individual trajectories when measured under the $1$-Wasserstein metric, yielding a smoother distributional Bellman objective. By aligning optimisation with this measure level structure, distributional RL provides better conditioned learning. We offer a principled explanation for the advantages of distributional methods in chaotic systems and the geometries of RL objectives under chaos.