🤖 AI Summary
To address the challenge of achieving real-time, robust, and cross-scene generalizable 360° omnidirectional depth estimation on edge devices for autonomous driving and robotics, this paper proposes a spherical-geometry-driven lightweight solution. Methodologically, we introduce: (1) a novel semi-supervised framework combining Curvilinear Spherical Scanning (CSS) with teacher–student collaborative learning, leveraging high-accuracy stereo models to generate reliable pseudo-labels for unlabeled real-world data; (2) a lightweight Rt-OmniMVS network jointly optimized with a HexaMODE six-camera fisheye hardware system for end-to-end spherical multi-view depth estimation; and (3) integrated data- and model-level augmentation with edge-specific optimization strategies. Our approach achieves 15 FPS real-time inference on edge hardware, matches state-of-the-art accuracy, and significantly reduces parameter count and computational cost. Extensive evaluation on large-scale indoor–outdoor datasets and real-world complex scenarios demonstrates superior generalization and robustness.
📝 Abstract
Omnidirectional depth estimation enables efficient 3D perception over a full 360-degree range. However, in real-world applications such as autonomous driving and robotics, achieving real-time performance and robust cross-scene generalization remains a significant challenge for existing algorithms. In this paper, we propose a real-time omnidirectional depth estimation method for edge computing platforms named Rt-OmniMVS, which introduces the Combined Spherical Sweeping method and implements the lightweight network structure to achieve real-time performance on edge computing platforms. To achieve high accuracy, robustness, and generalization in real-world environments, we introduce a teacher-student learning strategy. We leverage the high-precision stereo matching method as the teacher model to predict pseudo labels for unlabeled real-world data, and utilize data and model augmentation techniques for training to enhance performance of the student model Rt-OmniMVS. We also propose HexaMODE, an omnidirectional depth sensing system based on multi-view fisheye cameras and edge computation device. A large-scale hybrid dataset contains both unlabeled real-world data and synthetic data is collected for model training. Experiments on public datasets demonstrate that proposed method achieves results comparable to state-of-the-art approaches while consuming significantly less resource. The proposed system and algorithm also demonstrate high accuracy in various complex real-world scenarios, both indoors and outdoors, achieving an inference speed of 15 frames per second on edge computing platforms.