TacoDepth: Towards Efficient Radar-Camera Depth Estimation with One-stage Fusion

📅 2025-04-16
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
To address the low efficiency, poor robustness, and real-time deployment challenges in radar-camera fusion depth estimation—stemming from radar point cloud sparsity—this paper proposes an end-to-end, single-stage fusion architecture. Our method eliminates multi-stage pipelines and intermediate quasi-dense representations. Key contributions include: (1) a novel graph neural network-based radar feature extractor that explicitly models point cloud topology; and (2) a pyramid-style cross-modal fusion module enabling multi-scale feature alignment and direct dense depth regression. Evaluated on KITTI and nuScenes benchmarks, our approach achieves a 12.8% reduction in absolute relative error over state-of-the-art methods while attaining 32 FPS inference speed—a 91.8% improvement—demonstrating significant gains in real-time, high-accuracy depth perception for autonomous vehicles and robotic platforms.

Technology Category

Application Category

📝 Abstract
Radar-Camera depth estimation aims to predict dense and accurate metric depth by fusing input images and Radar data. Model efficiency is crucial for this task in pursuit of real-time processing on autonomous vehicles and robotic platforms. However, due to the sparsity of Radar returns, the prevailing methods adopt multi-stage frameworks with intermediate quasi-dense depth, which are time-consuming and not robust. To address these challenges, we propose TacoDepth, an efficient and accurate Radar-Camera depth estimation model with one-stage fusion. Specifically, the graph-based Radar structure extractor and the pyramid-based Radar fusion module are designed to capture and integrate the graph structures of Radar point clouds, delivering superior model efficiency and robustness without relying on the intermediate depth results. Moreover, TacoDepth can be flexible for different inference modes, providing a better balance of speed and accuracy. Extensive experiments are conducted to demonstrate the efficacy of our method. Compared with the previous state-of-the-art approach, TacoDepth improves depth accuracy and processing speed by 12.8% and 91.8%. Our work provides a new perspective on efficient Radar-Camera depth estimation.
Problem

Research questions and friction points this paper is trying to address.

Efficient fusion of Radar and camera for depth estimation
Overcoming sparsity and multi-stage inefficiency in Radar data
Balancing real-time speed and accuracy in autonomous systems
Innovation

Methods, ideas, or system contributions that make the work stand out.

One-stage Radar-Camera fusion for depth estimation
Graph-based Radar structure extractor for efficiency
Pyramid-based Radar fusion module for robustness
🔎 Similar Papers
No similar papers found.