Quantile-Coupled Flow Matching for Distributional Reinforcement Learning

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

193K/year

🤖 AI Summary

This work addresses a critical inconsistency in existing conditional flow matching (CFM) approaches for distributional reinforcement learning, where arbitrary source–target pairings yield losses misaligned with the Wasserstein distance, thereby violating the contraction property of the Bellman operator. To resolve this, the authors propose FlowIQN, which constructs quantile-aligned, monotonic optimal transport couplings by sorting source samples and Bellman targets within each minibatch, ensuring flow trajectories consistent with the Wasserstein metric. FlowIQN provides the first explicit Wasserstein projection guarantee for flow-matching distributional critics and incorporates a shortcut inference model to enhance computational efficiency. Empirical results demonstrate that FlowIQN significantly improves the Wasserstein accuracy of return distributions and achieves strong performance across multiple offline reinforcement learning benchmarks under various policy extraction settings, offering both theoretical rigor and practical effectiveness.

📝 Abstract

Unlike standard expected-return Reinforcement Learning (RL), Distributional RL (DRL) models the full return distribution, making it better-suited for uncertainty-aware and risk-sensitive decision-making. Conditional Flow Matching (CFM) critics have recently attracted attention for modelling continuous, multi-modal return distributions. Despite this interest, there remains a substantial metric mismatch: DRL theory relies on the distributional Bellman operator being contractive in the $p$-Wasserstein distance, yet existing CFM critics are trained with arbitrary source-target couplings, so their flow-matching losses are not Wasserstein-aligned surrogates for matching Bellman target return distributions. In this work, we address this mismatch by proposing FlowIQN, a CFM critic that sorts source and Bellman target samples within each mini-batch to approximate the monotone optimal transport coupling, replacing arbitrary pairings with quantile-aligned flow paths. We prove that the loss of our quantile-coupled CFM critic yields a Wasserstein-aligned approximate projection compatible with the foundations of DRL. To our knowledge, FlowIQN is the first flow-matching distributional critic with an explicit Wasserstein-aligned projection guarantee. We further extend FlowIQN with shortcut models for efficient inference. Empirical results show that FlowIQN improves Wasserstein return-distribution accuracy over other CFM critics. It also yields competitive performance on offline RL benchmarks across multiple policy extraction methods, providing a theoretically grounded CFM critic that is readily compatible with DRL pipelines. Code: https://github.com/ori-goals/flowIQN.

Problem

Research questions and friction points this paper is trying to address.

Distributional Reinforcement Learning

Conditional Flow Matching

Wasserstein distance

metric mismatch

return distribution

Innovation

Methods, ideas, or system contributions that make the work stand out.

Quantile-Coupled Flow Matching

Wasserstein Alignment

Distributional Reinforcement Learning