🤖 AI Summary
To address insufficient robustness of depth estimation under adverse conditions in autonomous driving, this paper proposes a lightweight radar-camera fusion depth estimation framework. Methodologically, we introduce two novel distillation strategies—interpretable alignment distillation and depth-distribution-aware distillation—alongside radar-image feature fusion and a discrete-bin-based soft classification-regression scheme. These components jointly ensure geometric consistency and decision transparency while enabling model compression. Compared to the baseline, our method reduces parameter count by 29.7% and achieves a 7.97% reduction in mean absolute error (MAE) on both the nuScenes and ZJU-4DRadarCam benchmarks. The resulting model delivers high accuracy, real-time inference capability, and intrinsic interpretability, offering an efficient and reliable solution for onboard deployment in safety-critical autonomous driving systems.
📝 Abstract
Depth estimation remains central to autonomous driving, and radar-camera fusion offers robustness in adverse conditions by providing complementary geometric cues. In this paper, we present XD-RCDepth, a lightweight architecture that reduces the parameters by 29.7% relative to the state-of-the-art lightweight baseline while maintaining comparable accuracy. To preserve performance under compression and enhance interpretability, we introduce two knowledge-distillation strategies: an explainability-aligned distillation that transfers the teacher's saliency structure to the student, and a depth-distribution distillation that recasts depth regression as soft classification over discretized bins. Together, these components reduce the MAE compared with direct training with 7.97% and deliver competitive accuracy with real-time efficiency on nuScenes and ZJU-4DRadarCam datasets.