🤖 AI Summary
Autonomous driving systems (ADS) face significant reliability-testing challenges in real-world complex environments: insufficient coverage of edge cases, substantial simulation-to-reality gaps, practical difficulties in V2X deployment, high testing costs for foundation models (e.g., VLMs/LLMs), and a lack of systematic testing standards. To address these, this study conducts a large-scale mixed-method investigation—including a 100-participant survey, systematic review of 105 papers, and expert consensus workshops—integrating industrial and academic perspectives for the first time. It identifies four unmet needs and five critical research limitations. The work proposes novel directions including cross-modal adaptation and cross-model collaborative verification, and outlines four key future pathways: (1) establishing comprehensive ADS testing criteria; (2) developing a V2X-enabled cross-model collaboration framework; (3) designing cross-modal evaluation methodologies for foundation models; and (4) building a scalable, large-scale validation infrastructure—thereby providing both theoretical foundations and practical guidance for next-generation ADS testing paradigms.
📝 Abstract
Autonomous driving systems (ADSs) promise improved transportation efficiency and safety, yet ensuring their reliability in complex real-world environments remains a critical challenge. Effective testing is essential to validate ADS performance and reduce deployment risks. This study investigates current ADS testing practices for both modular and end-to-end systems, identifies key demands from industry practitioners and academic researchers, and analyzes the gaps between existing research and real-world requirements. We review major testing techniques and further consider emerging factors such as Vehicle-to-Everything (V2X) communication and foundation models, including large language models and vision foundation models, to understand their roles in enhancing ADS testing. We conducted a large-scale survey with 100 participants from both industry and academia. Survey questions were refined through expert discussions, followed by quantitative and qualitative analyses to reveal key trends, challenges, and unmet needs. Our results show that existing ADS testing techniques struggle to comprehensively evaluate real-world performance, particularly regarding corner case diversity, the simulation to reality gap, the lack of systematic testing criteria, exposure to potential attacks, practical challenges in V2X deployment, and the high computational cost of foundation model-based testing. By further analyzing participant responses together with 105 representative studies, we summarize the current research landscape and highlight major limitations. This study consolidates critical research gaps in ADS testing and outlines key future research directions, including comprehensive testing criteria, cross-model collaboration in V2X systems, cross-modality adaptation for foundation model-based testing, and scalable validation frameworks for large-scale ADS evaluation.