🤖 AI Summary
Generating human-like, adaptive trajectories for autonomous driving in dynamic environments remains challenging due to data bias and distribution shift in existing generative models, limiting their ability to capture diverse driving styles.
Method: This paper proposes TrajHF—a novel framework that synergistically integrates multi-condition diffusion modeling with preference-based Reinforcement Learning from Human Feedback (RLHF), optimized via Proximal Policy Optimization (PPO). A human-style annotation interface enables safety-constrained, personalized, multi-modal trajectory alignment.
Contribution/Results: TrajHF overcomes fundamental limitations of conventional imitation learning by enabling fine-grained stylistic control while ensuring trajectory feasibility. Evaluated on the NavSim benchmark, it achieves a Planning Diversity and Multi-Modal Score (PDMS) of 93.95—significantly surpassing state-of-the-art methods—and demonstrates both effectiveness and generalizability of human-preference-driven trajectory generation.
📝 Abstract
Generating human-like and adaptive trajectories is essential for autonomous driving in dynamic environments. While generative models have shown promise in synthesizing feasible trajectories, they often fail to capture the nuanced variability of human driving styles due to dataset biases and distributional shifts. To address this, we introduce TrajHF, a human feedback-driven finetuning framework for generative trajectory models, designed to align motion planning with diverse driving preferences. TrajHF incorporates multi-conditional denoiser and reinforcement learning with human feedback to refine multi-modal trajectory generation beyond conventional imitation learning. This enables better alignment with human driving preferences while maintaining safety and feasibility constraints. TrajHF achieves PDMS of 93.95 on NavSim benchmark, significantly exceeding other methods. TrajHF sets a new paradigm for personalized and adaptable trajectory generation in autonomous driving.