🤖 AI Summary
End-to-end autonomous driving models—particularly teacher models like NAVSIM—struggle to capture unsafe behaviors due to their reliance on privileged signals and limited interpretability. Method: This paper proposes an expert-coordinated knowledge distillation framework that synthesizes heterogeneous teacher knowledge from human demonstrations and three interpretable rule-based experts (traffic-light compliance, lane-keeping, and ride-comfort constraints) to supervise a lightweight student model—ResNet-34 backbone with a multi-head decoder—for direct image-to-control policy learning. Crucially, the method eliminates dependence on privileged information. Contribution/Results: We introduce three novel evaluation metrics for safety and robustness, and validate the approach on the NAVSIM simulation platform. Our method achieves a state-of-the-art driving score of 91.0%, demonstrating superior generalization to complex scenarios and real-time inference efficiency.
📝 Abstract
Hydra-MDP++ introduces a novel teacher-student knowledge distillation framework with a multi-head decoder that learns from human demonstrations and rule-based experts. Using a lightweight ResNet-34 network without complex components, the framework incorporates expanded evaluation metrics, including traffic light compliance (TL), lane-keeping ability (LK), and extended comfort (EC) to address unsafe behaviors not captured by traditional NAVSIM-derived teachers. Like other end-to-end autonomous driving approaches, hydra processes raw images directly without relying on privileged perception signals. Hydra-MDP++ achieves state-of-the-art performance by integrating these components with a 91.0% drive score on NAVSIM through scaling to a V2-99 image encoder, demonstrating its effectiveness in handling diverse driving scenarios while maintaining computational efficiency.