🤖 AI Summary
This work addresses the limitations of lightweight time-of-flight (ToF) cameras, which suffer from restricted sensing ranges and thus struggle to support large-scale scene applications, compounded by the lack of dedicated datasets and poor generalization in existing depth completion methods. To bridge this gap, the authors construct a multi-sensor platform and introduce LASER-ToF, the first large-scale real-world ToF depth completion dataset. They further propose a lightweight depth completion network tailored to ToF characteristics, featuring a 3D-2D joint propagation pooling module and a multimodal cross-covariance attention mechanism to effectively model long-range dependencies and fuse non-uniformly sparse ToF measurements with sparse point clouds from visual SLAM. Experiments demonstrate that the proposed method reduces the mean absolute error by 8.6% over the next-best approach and achieves real-time 10 Hz operation on a quadrotor platform, enabling large-scale mapping and long-range planning.
📝 Abstract
Time-of-Flight (ToF) cameras possess compact design and high measurement precision to be applied to various robot tasks. However, their limited sensing range restricts deployment in large-scale scenarios. Depth completion has emerged as a potential solution to expand the sensing range of ToF cameras, but existing research lacks dedicated datasets and struggles to generalize to ToF measurements. In this paper, we propose a full-stack framework that enables depth completion in large-scale scenarios for short-range ToF cameras. First, we construct a multi-sensor platform with a reconstruction-based pipeline to collect real-world ToF samples with dense large-scale ground truth, yielding the first LArge-ScalE scenaRio ToF depth completion dataset (LASER-ToF). Second, we propose a sensor-aware depth completion network that incorporates a novel 3D branch with a 3D-2D Joint Propagation Pooling (JPP) module and Multimodal Cross-Covariance Attention (MXCA), enabling effective modeling of long-range relationships and efficient 3D-2D fusion under non-uniform ToF depth sparsity. Moreover, our network can utilize the sparse point cloud from visual SLAM as a supplement to ToF depth to further improve prediction accuracy. Experiments show that our method achieves an 8.6% lower mean absolute error than the second-best method, while maintaining lightweight design to support onboard deployment. Finally, to verify the system's applicability on real robots, we deploy proposed method on a quadrotor at a 10Hz runtime, enabling reliable large-scale mapping and long-range planning in challenging environments for short-range ToF cameras.