Finetuning Generative Trajectory Model with Reinforcement Learning from Human Feedback

📅 2025-03-13
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Generating human-like, adaptive trajectories for autonomous driving in dynamic environments remains challenging due to data bias and distribution shift in existing generative models, limiting their ability to capture diverse driving styles. Method: This paper proposes TrajHF—a novel framework that synergistically integrates multi-condition diffusion modeling with preference-based Reinforcement Learning from Human Feedback (RLHF), optimized via Proximal Policy Optimization (PPO). A human-style annotation interface enables safety-constrained, personalized, multi-modal trajectory alignment. Contribution/Results: TrajHF overcomes fundamental limitations of conventional imitation learning by enabling fine-grained stylistic control while ensuring trajectory feasibility. Evaluated on the NavSim benchmark, it achieves a Planning Diversity and Multi-Modal Score (PDMS) of 93.95—significantly surpassing state-of-the-art methods—and demonstrates both effectiveness and generalizability of human-preference-driven trajectory generation.

Technology Category

Application Category

📝 Abstract
Generating human-like and adaptive trajectories is essential for autonomous driving in dynamic environments. While generative models have shown promise in synthesizing feasible trajectories, they often fail to capture the nuanced variability of human driving styles due to dataset biases and distributional shifts. To address this, we introduce TrajHF, a human feedback-driven finetuning framework for generative trajectory models, designed to align motion planning with diverse driving preferences. TrajHF incorporates multi-conditional denoiser and reinforcement learning with human feedback to refine multi-modal trajectory generation beyond conventional imitation learning. This enables better alignment with human driving preferences while maintaining safety and feasibility constraints. TrajHF achieves PDMS of 93.95 on NavSim benchmark, significantly exceeding other methods. TrajHF sets a new paradigm for personalized and adaptable trajectory generation in autonomous driving.
Problem

Research questions and friction points this paper is trying to address.

Generating human-like trajectories for autonomous driving
Addressing dataset biases in trajectory generation models
Aligning motion planning with diverse driving preferences
Innovation

Methods, ideas, or system contributions that make the work stand out.

Finetuning generative models with human feedback
Multi-conditional denoiser for trajectory refinement
Reinforcement learning for personalized trajectory generation
🔎 Similar Papers
No similar papers found.
Derun Li
Derun Li
上海交通大学
J
Jianwei Ren
Shanghai Qi Zhi Institute
Y
Yue Wang
LiAuto
X
Xin Wen
LiAuto
Pengxiang Li
Pengxiang Li
Beijing Institute of Technology
Multimodal AgentVision and Language3DVHyperbolic Learning
L
Leimeng Xu
LiAuto
K
Kun Zhan
LiAuto
Z
Zhongpu Xia
LiAuto
P
Peng Jia
LiAuto
X
Xianpeng Lang
LiAuto
N
Ningyi Xu
Shanghai Jiao Tong University
H
Hang Zhao
Shanghai Qi Zhi Institute, Tsinghua University