Fisher Decorator: Refining Flow Policy via A Local Transport Map

📅 2026-04-20
📈 Citations: 0
Influential: 0
📄 PDF

career value

211K/year
🤖 AI Summary
This work addresses a key limitation in existing flow-based offline reinforcement learning methods, which employ isotropic L² regularization that fails to align with the anisotropic geometric structure of the behavioral policy manifold, thereby inducing optimization bias. To overcome this, the authors propose a local transport map framework that formulates policy optimization as a residual displacement over an initial flow-based policy. By leveraging the Fisher information matrix, they construct a quadratic approximation of the local Kullback–Leibler divergence, enabling efficient optimization under anisotropic constraints. The approach synergistically integrates flow matching, Wasserstein geometry, and score functions, achieving state-of-the-art performance across multiple offline reinforcement learning benchmarks while providing provable bounds on approximation error within local neighborhoods.

Technology Category

Application Category

📝 Abstract
Recent advances in flow-based offline reinforcement learning (RL) have achieved strong performance by parameterizing policies via flow matching. However, they still face critical trade-offs among expressiveness, optimality, and efficiency. In particular, existing flow policies interpret the $L_2$ regularization as an upper bound of the 2-Wasserstein distance ($W_2$), which can be problematic in offline settings. This issue stems from a fundamental geometric mismatch: the behavioral policy manifold is inherently anisotropic, whereas the $L_2$ (or upper bound of $W_2$) regularization is isotropic and density-insensitive, leading to systematically misaligned optimization directions. To address this, we revisit offline RL from a geometric perspective and show that policy refinement can be formulated as a local transport map: an initial flow policy augmented by a residual displacement. By analyzing the induced density transformation, we derive a local quadratic approximation of the KL-constrained objective governed by the Fisher information matrix, enabling a tractable anisotropic optimization formulation. By leveraging the score function embedded in the flow velocity, we obtain a corresponding quadratic constraint for efficient optimization. Our results reveal that the optimality gap in prior methods arises from their isotropic approximation. In contrast, our framework achieves a controllable approximation error within a provable neighborhood of the optimal solution. Extensive experiments demonstrate state-of-the-art performance across diverse offline RL benchmarks. See project page: https://github.com/ARC0127/Fisher-Decorator.
Problem

Research questions and friction points this paper is trying to address.

offline reinforcement learning
flow-based policy
anisotropy
Wasserstein distance
optimality gap
Innovation

Methods, ideas, or system contributions that make the work stand out.

Fisher information
flow matching
offline reinforcement learning
anisotropic optimization
local transport map