🤖 AI Summary
This paper addresses the challenge of jointly optimizing user intent understanding, interpretable decision-making, and regulatory compliance in personalized autonomous driving. To this end, we propose PADriver, a closed-loop framework that integrates multimodal large language models (MLLMs) with streaming video perception and introduces, for the first time, an explicit hazard-level prediction mechanism to provide human-interpretable grounding for action decisions. We further construct PAD-Highway—the first closed-loop evaluation benchmark tailored for personalized driving—comprising 250 hours of high-quality annotated highway videos. Experiments demonstrate that PADriver consistently outperforms existing methods on PAD-Highway, enabling text-prompt-driven scene understanding, hazard assessment, and generation of diverse, user-adapted driving policies. It achieves significant improvements in traffic-rule compliance, safety, and alignment with individual user preferences.
📝 Abstract
In this paper, we propose PADriver, a novel closed-loop framework for personalized autonomous driving (PAD). Built upon Multi-modal Large Language Model (MLLM), PADriver takes streaming frames and personalized textual prompts as inputs. It autoaggressively performs scene understanding, danger level estimation and action decision. The predicted danger level reflects the risk of the potential action and provides an explicit reference for the final action, which corresponds to the preset personalized prompt. Moreover, we construct a closed-loop benchmark named PAD-Highway based on Highway-Env simulator to comprehensively evaluate the decision performance under traffic rules. The dataset contains 250 hours videos with high-quality annotation to facilitate the development of PAD behavior analysis. Experimental results on the constructed benchmark show that PADriver outperforms state-of-the-art approaches on different evaluation metrics, and enables various driving modes.