FastViDAR: Real-Time Omnidirectional Depth Estimation via Alternative Hierarchical Attention

📅 2025-09-28
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
This work addresses the challenge of real-time 360° depth estimation from quad-fisheye camera inputs for embedded platforms. Methodologically, we propose an Alternating Hierarchical Attention (AHA) mechanism that enables low-overhead cross-view feature fusion via intra- and inter-frame windowed self-attention, and—uniquely—unifies multi-view depth prediction and fusion within the Equirectangular Projection (ERP) coordinate system. We further introduce a differentiable fisheye-to-ERP projection layer to enable end-to-end training. Evaluated on HM3D and 2D3D-S datasets, our method achieves 20 FPS real-time inference on an NVIDIA Orin NX platform while demonstrating strong zero-shot generalization across unseen scenes and fisheye configurations. The approach delivers an efficient, robust omnidirectional depth perception solution suitable for autonomous driving and mobile robotics applications.

Technology Category

Application Category

📝 Abstract
In this paper we propose FastViDAR, a novel framework that takes four fisheye camera inputs and produces a full $360^circ$ depth map along with per-camera depth, fusion depth, and confidence estimates. Our main contributions are: (1) We introduce Alternative Hierarchical Attention (AHA) mechanism that efficiently fuses features across views through separate intra-frame and inter-frame windowed self-attention, achieving cross-view feature mixing with reduced overhead. (2) We propose a novel ERP fusion approach that projects multi-view depth estimates to a shared equirectangular coordinate system to obtain the final fusion depth. (3) We generate ERP image-depth pairs using HM3D and 2D3D-S datasets for comprehensive evaluation, demonstrating competitive zero-shot performance on real datasets while achieving up to 20 FPS on NVIDIA Orin NX embedded hardware. Project page: href{https://3f7dfc.github.io/FastVidar/}{https://3f7dfc.github.io/FastVidar/}
Problem

Research questions and friction points this paper is trying to address.

Real-time omnidirectional depth estimation from fisheye cameras
Efficient cross-view fusion via hierarchical attention mechanism
Generating 360° depth maps on embedded hardware platforms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Alternative Hierarchical Attention fuses multi-view features efficiently
ERP fusion projects depth estimates to equirectangular coordinates
Achieves real-time 20 FPS performance on embedded hardware
🔎 Similar Papers
No similar papers found.
H
Hangtian Zhao
University of Science and Technology of China
X
Xiang Chen
East China Normal University
Y
Yizhe Li
Xidian University
Qianhao Wang
Qianhao Wang
PhD, Zhejiang University
Robotics
Haibo Lu
Haibo Lu
FAST Lab, Zhejiang University
F
Fei Gao
FAST Lab, Zhejiang University