An interactive enhanced driving dataset for autonomous driving

📅 2026-02-24
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the scarcity of rich interactive scenarios and precise multimodal alignment in existing autonomous driving datasets, which hinders the development of vision-language-action (VLA) models. To overcome this limitation, the authors introduce the Interaction-Enhanced Driving Dataset (IEDD), which leverages a scalable pipeline to extract millions of interaction-rich clips from naturalistic driving videos and proposes a novel trajectory-based method to quantitatively characterize interactions. Furthermore, they generate synthetic bird’s-eye-view videos (IEDD-VQA) that enable strict alignment between structured language descriptions and semantic actions. The dataset supports both training and evaluation of VLA models, and benchmarking across ten prominent vision-language models demonstrates its high reusability for model fine-tuning and reasoning capability assessment.

Technology Category

Application Category

📝 Abstract
The evolution of autonomous driving towards full automation demands robust interactive capabilities; however, the development of Vision-Language-Action (VLA) models is constrained by the sparsity of interactive scenarios and inadequate multimodal alignment in existing data. To this end, this paper proposes the Interactive Enhanced Driving Dataset (IEDD). We develop a scalable pipeline to mine million-level interactive segments from naturalistic driving data based on interactive trajectories, and design metrics to quantify the interaction processes. Furthermore, the IEDD-VQA dataset is constructed by generating synthetic Bird's Eye View (BEV) videos where semantic actions are strictly aligned with structured language. Benchmark results evaluating ten mainstream Vision Language Models (VLMs) are provided to demonstrate the dataset's reuse value in assessing and fine-tuning the reasoning capabilities of autonomous driving models.
Problem

Research questions and friction points this paper is trying to address.

interactive scenarios
multimodal alignment
Vision-Language-Action models
autonomous driving dataset
Innovation

Methods, ideas, or system contributions that make the work stand out.

Interactive Enhanced Driving Dataset
Vision-Language-Action (VLA)
multimodal alignment
Bird's Eye View (BEV)
interactive trajectory mining
🔎 Similar Papers
No similar papers found.
H
Haojie Feng
School of Automotive Studies, Tongji University, Shanghai, 201804, China
P
Peizhi Zhang
School of Automotive Studies, Tongji University, Shanghai, 201804, China
M
Mengjie Tian
School of Automotive Studies, Tongji University, Shanghai, 201804, China
X
Xinrui Zhang
School of Automotive Studies, Tongji University, Shanghai, 201804, China
Zhuoren Li
Zhuoren Li
Ph.D. Candidate
autonomous vehiclesintelligent transportationmotion planningreinforcement learning
J
Junpeng Huang
School of Automotive Studies, Tongji University, Shanghai, 201804, China
X
Xiurong Wang
School of Automotive Studies, Tongji University, Shanghai, 201804, China
J
Junfan Zhu
University of Chicago, Chicago, IL 60637, USA
J
Jianzhou Wang
Faculty of Computer Science, University of New Brunswick, Fredericton, NB E3B 5A3, Canada
D
Dongxiao Yin
Tongji Automotive Design & Research Institute Co., Ltd., Shanghai, 201804, China
L
Lu Xiong
School of Automotive Studies, Tongji University, Shanghai, 201804, China