A Comprehensive Evaluation of Four End-to-End AI Autopilots Using CCTest and the Carla Leaderboard

📅 2025-01-21
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing safety verification for end-to-end autonomous driving suffers from insufficient objectivity, risk sensitivity, and coverage completeness in test scenario generation and evaluation. Method: We propose Critical Configuration Testing (CCTest), a principled approach that constructs high-risk, low-redundancy test suites grounded in realistic, feasible safety policies—adhering strictly to the criterion of “containing only potentially safety-critical scenarios.” Contribution/Results: Evaluated on CARLA against Transfuser, InterFuser, MILE, and LMDriver—and benchmarked against the CARLA Leaderboard—we (i) uncover fundamental differences in critical failure modes between end-to-end and modular architectures; (ii) quantify failure distributions across four model families, demonstrating that CCTest reliably triggers uncompromising safety failures; and (iii) reveal that mainstream leaderboards, prioritizing coverage metrics, substantially undermine risk sensitivity—thereby underscoring the necessity of safety-oriented testing.

Technology Category

Application Category

📝 Abstract
Scenario-based testing is currently the dominant simulation-based validation approach for ADS. Its effective application raises two interrelated issues. The first is the choice of the method used to generate scenarios, based on various criteria such as risk, degree of autonomy, degree of coverage and representativeness, and complexity. The other is the choice of the evaluation method for estimating the safety and performance of the system under test. This work extends a study of the critical configuration testing (CCTest) approach we have already applied to four open modular autopilots. This approach differs from general scenario-based approaches in that it uses only realistic, potentially safe critical scenarios. It enables an accurate assessment of the ability to drive safely in critical situations for which feasible safety policies exist. Any incident observed in the simulation involves the failure of a tested autopilot. The contribution of this paper is twofold. First, we apply the critical configuration testing approach to four end-to-end open autopilots, Transfuser, InterFuser, MILE and LMDriver, and compare their test results with those of the four modular open autopilots previously tested with the same approach implemented in the Carla simulation environment. This comparison identifies both differences and similarities in the failures of the two autopilot types in critical situations. Secondly, we compare the evaluations of the four autopilots carried out in the Carla Leaderboard with our results obtained by testing critical configurations. This comparison reveals significant discrepancies, reflecting differences in test case generation criteria and risk assessment methods. It underlines the need to work towards the development of objective assessment methods combining qualitative and quantitative criteria.
Problem

Research questions and friction points this paper is trying to address.

Autonomous Driving Systems
Safety Assessment
Test Scenario Generation
Innovation

Methods, ideas, or system contributions that make the work stand out.

End-to-End Autonomous Driving Systems
Critical Configuration Testing
Performance Evaluation Comparison
🔎 Similar Papers
No similar papers found.
Changwen Li
Changwen Li
State Key Laboratory of Computer Science, Institute of Software Chinese Academy of Sciences
software engineeringformal methodsautonomous systems
Joseph Sifakis
Joseph Sifakis
Reseracher at Verimag laboratory, Grenoble
software engineeringformal methodsweb servicesmiddlewarenetworks
R
Rongjie Yan
Key Laboratory of System Software, Institute of Software, Chinese Academy of Sciences
J
Jian Zhang
Key Laboratory of System Software, Institute of Software, Chinese Academy of Sciences