Drifting Field Policy: A One-Step Generative Policy via Wasserstein Gradient Flow

📅 2026-05-08

📈 Citations: 0

✨ Influential: 0

career value

169K/year

🤖 AI Summary

This work addresses the inefficiency of existing ODE-based policy generation methods in robotic manipulation, which typically require multiple inference steps. The authors propose a novel single-step, non-ODE policy generation framework that, for the first time, incorporates Wasserstein-2 gradient flows into policy learning. By modeling policy updates as a reverse KL gradient flow toward a soft target policy, the approach enables one-step optimization directly in probability space. The method integrates value improvement, anchor-policy score matching, and critic-guided action selection, and introduces a computationally tractable surrogate loss. Evaluated on multiple tasks from Robomimic and OGBench, the proposed approach achieves state-of-the-art performance with only a single inference step, significantly outperforming existing ODE-based strategies.

📝 Abstract

We propose Drifting Field Policy (DFP), a non-ODE one-step generative policy built on the drifting model paradigm. We frame the policy update as a reverse-KL Wasserstein-2 gradient flow toward a soft target policy, so that each DFP update corresponds to a gradient step in probability space. By construction, this gradient is decomposed into an ascent toward higher action-value regions and a score matching with the anchor policy as a trust region. We further derive a simple, tractable surrogate of the otherwise intractable update loss, akin to behavior cloning on top-K critic-selected actions. We find empirically that this mechanism uniquely benefits the drifting backbone owing to its non-ODE parameterization. With one-step inference, DFP achieves state-of-the-art performance on several manipulation tasks across Robomimic and OGBench, outperforming ODE-based policies.

Problem

Research questions and friction points this paper is trying to address.

generative policy

Wasserstein gradient flow

non-ODE

policy update

reinforcement learning

Innovation

Methods, ideas, or system contributions that make the work stand out.

Drifting Field Policy

Wasserstein Gradient Flow

One-Step Generative Policy