Real-Time Generative Policy via Langevin-Guided Flow Matching for Autonomous Driving

📅 2026-03-03
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of high inference latency in generative policies that hinders their application to real-time decision-making in autonomous driving. The authors propose a novel online reinforcement learning method that, for the first time, integrates flow matching with Langevin dynamics: leveraging gradients of the Q-function to guide Langevin sampling for dynamic action optimization, and training a flow-based policy to efficiently map from a simple prior to a target distribution that balances high reward and effective exploration. Evaluated in multi-lane and intersection driving simulations, the approach significantly outperforms DACER and DSAC with extremely low inference latency. Furthermore, it achieves a score of 775.8 on the humanoid-stand task in the DeepMind Control Suite, surpassing existing methods and demonstrating a unified capability of low latency, high performance, and strong exploration.

Technology Category

Application Category

📝 Abstract
Reinforcement learning (RL) is a fundamental methodology in autonomous driving systems, where generative policies exhibit considerable potential by leveraging their ability to model complex distributions to enhance exploration. However, their inherent high inference latency severely impedes their deployment in real-time decision-making and control. To address this issue, we propose diffusion actor-critic with entropy regulator via flow matching (DACER-F) by introducing flow matching into online RL, enabling the generation of competitive actions in a single inference step. By leveraging Langevin dynamics and gradients of the Q-function, DACER-F dynamically optimizes actions from experience replay toward a target distribution that balances high Q-value information with exploratory behavior. The flow policy is then trained to efficiently learn a mapping from a simple prior distribution to this dynamic target. In complex multi-lane and intersection simulations, DACER-F outperforms baselines diffusion actor-critic with entropy regulator (DACER) and distributional soft actor-critic (DSAC), while maintaining an ultra-low inference latency. DACER-F further demonstrates its scalability on standard RL benchmark DeepMind Control Suite (DMC), achieving a score of 775.8 in the humanoid-stand task and surpassing prior methods. Collectively, these results establish DACER-F as a high-performance and computationally efficient RL algorithm.
Problem

Research questions and friction points this paper is trying to address.

real-time inference
generative policy
autonomous driving
reinforcement learning
inference latency
Innovation

Methods, ideas, or system contributions that make the work stand out.

flow matching
Langevin dynamics
real-time generative policy
diffusion actor-critic
online reinforcement learning
🔎 Similar Papers
No similar papers found.
T
Tianze Zhu
School of Vehicle and Mobility, Tsinghua University, Beijing, China
Yinuo Wang
Yinuo Wang
Tsinghua University
LLMReinforcement LearningAutonomous DrivingDiffusion Model
W
Wenjun Zou
School of Vehicle and Mobility, Tsinghua University, Beijing, China
T
Tianyi Zhang
School of Vehicle and Mobility, Tsinghua University, Beijing, China
L
Likun Wang
School of Vehicle and Mobility, Tsinghua University, Beijing, China
L
Letian Tao
School of Vehicle and Mobility, Tsinghua University, Beijing, China
F
Feihong Zhang
School of Vehicle and Mobility, Tsinghua University, Beijing, China
Yao Lyu
Yao Lyu
Postdoctor, Tsinghua University
autonomous drivingembodied AIreinforcement learning
S
Shengbo Eben Li
College of Artificial Intelligence, Tsinghua University, Beijing, China