Revisiting Diffusion Q-Learning: From Iterative Denoising to One-Step Action Generation

📅 2025-08-19

📈 Citations: 0

✨ Influential: 0

career value

196K/year

🤖 AI Summary

Diffusion Q-Learning (DQL) suffers from low training/inference efficiency and poor stability due to its reliance on multi-step denoising. To address this, we propose One-Step Flow Q-Learning (OFQL), the first method to integrate Flow Matching into the diffusion Q-learning framework. OFQL models the average velocity field over the state-action space, enabling direct, single-step action generation—eliminating the need for iterative sampling, auxiliary models, or staged training. Crucially, the velocity field is end-to-end optimized under the Q-learning objective, jointly maximizing policy quality and generation efficiency. On the D4RL benchmark, OFQL significantly outperforms DQL and other diffusion-based baselines: it achieves several-fold speedups in both training and inference, superior final performance, and markedly improved convergence stability. Our core contribution is a principled, concise, differentiable, single-step diffusion-based paradigm for policy learning.

Technology Category

Application Category

📝 Abstract

The generative power of diffusion models (DMs) has recently enabled high-performing decision-making algorithms in offline reinforcement learning (RL), achieving state-of-the-art results across standard benchmarks. Among them, Diffusion Q-Learning (DQL) stands out as a leading method for its consistently strong performance. Nevertheless, DQL remains limited in practice due to its reliance on multi-step denoising for action generation during both training and inference. Although one-step denoising is desirable, simply applying it to DQL leads to a drastic performance drop. In this work, we revisit DQL and identify its core limitations. We then propose One-Step Flow Q-Learning (OFQL), a novel framework that enables efficient one-step action generation during both training and inference, without requiring auxiliary models, distillation, or multi-phase training. Specifically, OFQL reformulates DQL within the sample-efficient Flow Matching (FM) framework. While conventional FM induces curved generative trajectories that impede one-step generation, OFQL instead learns an average velocity field that facilitates direct, accurate action generation. Collectively, OFQL eliminates the need for multi-step sampling and recursive gradient updates in DQL, resulting in faster and more robust training and inference. Extensive experiments on the D4RL benchmark demonstrate that OFQL outperforms DQL and other diffusion-based baselines, while substantially reducing both training and inference time compared to DQL.

Problem

Research questions and friction points this paper is trying to address.

Enabling efficient one-step action generation in offline reinforcement learning

Overcoming multi-step denoising limitations in Diffusion Q-Learning

Reducing computational costs while maintaining performance in decision-making algorithms

Innovation

Methods, ideas, or system contributions that make the work stand out.

One-step action generation via Flow Matching

Learning average velocity field for accuracy

Eliminates multi-step sampling and gradient updates

🔎 Similar Papers

Iterated $Q$-Network: Beyond One-Step Bellman Updates in Deep Reinforcement Learning