Flow Models for Unbounded and Geometry-Aware Distributional Reinforcement Learning

📅 2025-05-07
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses three key limitations of existing distributional reinforcement learning (DistRL) methods: bounded support sets, weak modeling capacity, and low parameter efficiency. To overcome these, we propose an unbounded probability density function (PDF) modeling paradigm based on normalizing flows (RealNVP/MAF), enabling flexible representation of multimodal, skewed, and heavy-tailed return distributions. We further introduce a geometrically aware surrogate loss derived from the Cramér distance, operating directly on PDFs—thus avoiding numerical integration over cumulative distribution functions (CDFs)—while simultaneously supporting unbounded supports and ensuring parameter efficiency. Evaluated on the ATARI-5 benchmark, our approach significantly outperforms existing PDF-based DistRL methods, matches the performance of state-of-the-art quantile-based methods (e.g., QR-DQN, IQN), and improves both training stability and the synergy between representational expressivity and optimization dynamics.

Technology Category

Application Category

📝 Abstract
We introduce a new architecture for Distributional Reinforcement Learning (DistRL) that models return distributions using normalizing flows. This approach enables flexible, unbounded support for return distributions, in contrast to categorical approaches like C51 that rely on fixed or bounded representations. It also offers richer modeling capacity to capture multi-modality, skewness, and tail behavior than quantile based approaches. Our method is significantly more parameter-efficient than categorical approaches. Standard metrics used to train existing models like KL divergence or Wasserstein distance either are scale insensitive or have biased sample gradients, especially when return supports do not overlap. To address this, we propose a novel surrogate for the Cram`er distance, that is geometry-aware and computable directly from the return distribution's PDF, avoiding the costly CDF computation. We test our model on the ATARI-5 sub-benchmark and show that our approach outperforms PDF based models while remaining competitive with quantile based methods.
Problem

Research questions and friction points this paper is trying to address.

Modeling unbounded return distributions in Distributional Reinforcement Learning
Addressing scale insensitivity and biased gradients in existing metrics
Improving parameter efficiency and distributional representation flexibility
Innovation

Methods, ideas, or system contributions that make the work stand out.

Normalizing flows model unbounded return distributions
Geometry-aware surrogate for Cramér distance
Parameter-efficient architecture for DistRL
🔎 Similar Papers
No similar papers found.
C
C. SimoAlami
LIX, Ecole Polytechnique/CNRS, IP Paris
R
Rim Kaddah
IRT SystemX
Jesse Read
Jesse Read
École Polytechnique
Multi-label ClassificationData-Stream LearningMachine LearningArtificial IntelligenceData Science
M
Marie-Paule Cani
LIX, Ecole Polytechnique/CNRS, IP Paris