DeFlow: Decoupling Manifold Modeling and Value Maximization for Offline Policy Extraction

📅 2026-01-15
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the high computational cost and instability of existing generative policy optimization methods in offline reinforcement learning, which struggle to efficiently model complex behavioral manifolds. To overcome these limitations, the paper proposes DeFlow, a novel framework that decouples manifold modeling from value maximization. DeFlow employs flow matching to train a lightweight refinement module within a data-driven trust region, eliminating the need for backpropagation through ODE solvers while preserving the iterative expressive power of flow models. This design obviates the necessity for multi-objective loss balancing, substantially improving training efficiency. Empirical results on the OGBench benchmark demonstrate that DeFlow achieves state-of-the-art performance and exhibits strong capability in transferring policies from offline to online settings.

Technology Category

Application Category

📝 Abstract
We present DeFlow, a decoupled offline RL framework that leverages flow matching to faithfully capture complex behavior manifolds. Optimizing generative policies is computationally prohibitive, typically necessitating backpropagation through ODE solvers. We address this by learning a lightweight refinement module within an explicit, data-derived trust region of the flow manifold, rather than sacrificing the iterative generation capability via single-step distillation. This way, we bypass solver differentiation and eliminate the need for balancing loss terms, ensuring stable improvement while fully preserving the flow's iterative expressivity. Empirically, DeFlow achieves superior performance on the challenging OGBench benchmark and demonstrates efficient offline-to-online adaptation.
Problem

Research questions and friction points this paper is trying to address.

offline reinforcement learning
flow matching
behavior manifold
policy extraction
computational efficiency
Innovation

Methods, ideas, or system contributions that make the work stand out.

flow matching
offline reinforcement learning
decoupled policy extraction
behavior manifold
trust region
🔎 Similar Papers
No similar papers found.