π€ AI Summary
Existing end-to-end autonomous driving methods suffer from significant limitations in adverse weather conditions, occlusions, and accurate velocity estimation, hindering motion understanding and long-horizon trajectory prediction in safety-critical scenarios. To address these challenges, we propose the first planning-driven, query-based radar-camera fusion framework. Our method enhances cross-modal spatial consistency and temporal coherence through sparse 3D feature alignment, Doppler-assisted velocity estimation, agent-centric anchor optimization, map polyline modeling, and joint motion prediction. Evaluated on nuScenes, T-nuScenes, and Bench2Drive, our approach achieves a 4.8% improvement in 3D detection mAP, an 8.3% gain in multi-object tracking AMOTA, and a 9% reduction in trajectory planning error (TPC) over vision-only baselines.
π Abstract
End-to-end autonomous driving systems promise stronger performance through unified optimization of perception, motion forecasting, and planning. However, vision-based approaches face fundamental limitations in adverse weather conditions, partial occlusions, and precise velocity estimation - critical challenges in safety-sensitive scenarios where accurate motion understanding and long-horizon trajectory prediction are essential for collision avoidance. To address these limitations, we propose SpaRC-AD, a query-based end-to-end camera-radar fusion framework for planning-oriented autonomous driving. Through sparse 3D feature alignment, and doppler-based velocity estimation, we achieve strong 3D scene representations for refinement of agent anchors, map polylines and motion modelling. Our method achieves strong improvements over the state-of-the-art vision-only baselines across multiple autonomous driving tasks, including 3D detection (+4.8% mAP), multi-object tracking (+8.3% AMOTA), online mapping (+1.8% mAP), motion prediction (-4.0% mADE), and trajectory planning (-0.1m L2 and -9% TPC). We achieve both spatial coherence and temporal consistency on multiple challenging benchmarks, including real-world open-loop nuScenes, long-horizon T-nuScenes, and closed-loop simulator Bench2Drive. We show the effectiveness of radar-based fusion in safety-critical scenarios where accurate motion understanding and long-horizon trajectory prediction are essential for collision avoidance. The source code of all experiments is available at https://phi-wol.github.io/sparcad/