Quantile-Coupled Flow Matching for Distributional Reinforcement Learning

๐Ÿ“… 2026-05-08
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF

career value

185K/year
๐Ÿค– AI Summary
This work addresses a critical inconsistency in existing conditional flow matching (CFM) approaches for distributional reinforcement learning, where arbitrary sourceโ€“target pairings yield losses misaligned with the Wasserstein distance, thereby violating the contraction property of the Bellman operator. To resolve this, the authors propose FlowIQN, which constructs quantile-aligned, monotonic optimal transport couplings by sorting source samples and Bellman targets within each minibatch, ensuring flow trajectories consistent with the Wasserstein metric. FlowIQN provides the first explicit Wasserstein projection guarantee for flow-matching distributional critics and incorporates a shortcut inference model to enhance computational efficiency. Empirical results demonstrate that FlowIQN significantly improves the Wasserstein accuracy of return distributions and achieves strong performance across multiple offline reinforcement learning benchmarks under various policy extraction settings, offering both theoretical rigor and practical effectiveness.
๐Ÿ“ Abstract
Unlike standard expected-return Reinforcement Learning (RL), Distributional RL (DRL) models the full return distribution, making it better-suited for uncertainty-aware and risk-sensitive decision-making. Conditional Flow Matching (CFM) critics have recently attracted attention for modelling continuous, multi-modal return distributions. Despite this interest, there remains a substantial metric mismatch: DRL theory relies on the distributional Bellman operator being contractive in the $p$-Wasserstein distance, yet existing CFM critics are trained with arbitrary source-target couplings, so their flow-matching losses are not Wasserstein-aligned surrogates for matching Bellman target return distributions. In this work, we address this mismatch by proposing FlowIQN, a CFM critic that sorts source and Bellman target samples within each mini-batch to approximate the monotone optimal transport coupling, replacing arbitrary pairings with quantile-aligned flow paths. We prove that the loss of our quantile-coupled CFM critic yields a Wasserstein-aligned approximate projection compatible with the foundations of DRL. To our knowledge, FlowIQN is the first flow-matching distributional critic with an explicit Wasserstein-aligned projection guarantee. We further extend FlowIQN with shortcut models for efficient inference. Empirical results show that FlowIQN improves Wasserstein return-distribution accuracy over other CFM critics. It also yields competitive performance on offline RL benchmarks across multiple policy extraction methods, providing a theoretically grounded CFM critic that is readily compatible with DRL pipelines. Code: https://github.com/ori-goals/flowIQN.
Problem

Research questions and friction points this paper is trying to address.

Distributional Reinforcement Learning
Conditional Flow Matching
Wasserstein distance
metric mismatch
return distribution
Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantile-Coupled Flow Matching
Wasserstein Alignment
Distributional Reinforcement Learning
Conditional Flow Matching
Optimal Transport Coupling