🤖 AI Summary
Current autonomous driving research predominantly emphasizes benchmark performance gains while neglecting deeper issues such as failure mechanisms, systemic biases, and shortcut learning. To address this, we propose a data-centric, systematic diagnostic paradigm: we construct a closed-loop evaluation framework based on CARLA and design a lightweight object-centric planning Transformer that enables object-level representation learning, precise input perturbation, and fine-grained attribution analysis. Our approach is the first to uncover structural deficiencies—including expert behavior rigidity and insufficient obstacle diversity—that induce shortcut learning. Evaluated on Longest6 v2, Bench2Drive, and CARLA validation routes, our method achieves state-of-the-art performance. We publicly release all code and models to foster community-wide transition from “performance-driven” to “robustness- and bias-aware” research paradigms.
📝 Abstract
Most recent work in autonomous driving has prioritized benchmark performance and methodological innovation over in-depth analysis of model failures, biases, and shortcut learning. This has led to incremental improvements without a deep understanding of the current failures. While it is straightforward to look at situations where the model fails, it is hard to understand the underlying reason. This motivates us to conduct a systematic study, where inputs to the model are perturbed and the predictions observed. We introduce PlanT 2.0, a lightweight, object-centric planning transformer designed for autonomous driving research in CARLA. The object-level representation enables controlled analysis, as the input can be easily perturbed (e.g., by changing the location or adding or removing certain objects), in contrast to sensor-based models. To tackle the scenarios newly introduced by the challenging CARLA Leaderboard 2.0, we introduce multiple upgrades to PlanT, achieving state-of-the-art performance on Longest6 v2, Bench2Drive, and the CARLA validation routes. Our analysis exposes insightful failures, such as a lack of scene understanding caused by low obstacle diversity, rigid expert behaviors leading to exploitable shortcuts, and overfitting to a fixed set of expert trajectories. Based on these findings, we argue for a shift toward data-centric development, with a focus on richer, more robust, and less biased datasets. We open-source our code and model at https://github.com/autonomousvision/plant2.