FishBEV: Distortion-Resilient Bird's Eye View Segmentation with Surround-View Fisheye Cameras

📅 2025-09-17
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing bird’s-eye view (BEV) segmentation methods for fisheye cameras suffer from severe geometric distortion, ambiguous cross-view correspondences, and unstable temporal dynamics—leading to degraded performance. To address these challenges, this paper proposes the first fisheye-specific BEV segmentation framework. Methodologically: (1) distortion-adaptive multi-scale feature extraction mitigates radial distortion effects; (2) uncertainty-aware spatial cross-attention (U-SCA) enhances robustness to cross-view misalignment; and (3) duration-aware temporal self-attention (D-TSA) models non-uniform inter-frame dynamics. The framework adopts DRME as its backbone and integrates the above modules with multi-scale uncertainty estimation. Evaluated on the SynWoodScape dataset, our approach significantly outperforms existing state-of-the-art methods in surround-view fisheye BEV segmentation, achieving superior accuracy and temporal stability.

Technology Category

Application Category

📝 Abstract
As a cornerstone technique for autonomous driving, Bird's Eye View (BEV) segmentation has recently achieved remarkable progress with pinhole cameras. However, it is non-trivial to extend the existing methods to fisheye cameras with severe geometric distortion, ambiguous multi-view correspondences and unstable temporal dynamics, all of which significantly degrade BEV performance. To address these challenges, we propose FishBEV, a novel BEV segmentation framework specifically tailored for fisheye cameras. This framework introduces three complementary innovations, including a Distortion-Resilient Multi-scale Extraction (DRME) backbone that learns robust features under distortion while preserving scale consistency, an Uncertainty-aware Spatial Cross-Attention (U-SCA) mechanism that leverages uncertainty estimation for reliable cross-view alignment, a Distance-aware Temporal Self-Attention (D-TSA) module that adaptively balances near field details and far field context to ensure temporal coherence. Extensive experiments on the Synwoodscapes dataset demonstrate that FishBEV consistently outperforms SOTA baselines, regarding the performance evaluation of FishBEV on the surround-view fisheye BEV segmentation tasks.
Problem

Research questions and friction points this paper is trying to address.

Addresses fisheye camera distortion in BEV segmentation
Solves ambiguous multi-view correspondences in autonomous driving
Ensures temporal coherence with adaptive attention mechanisms
Innovation

Methods, ideas, or system contributions that make the work stand out.

Distortion-Resilient Multi-scale Extraction backbone
Uncertainty-aware Spatial Cross-Attention mechanism
Distance-aware Temporal Self-Attention module
🔎 Similar Papers
No similar papers found.
H
Hang Li
College of Computer Science, Nankai University, Tianjin, 300350, China
D
Dianmo Sheng
School of Cyber Security, University of Science and Technology of China, Hefei, Anhui, 230026, P.R.China
Q
Qiankun Dong
College of Computer Science, Nankai University, Tianjin, 300350, China
Zichun Wang
Zichun Wang
Student, West Virginia State University, U.S.A.
AIMachine LearningLLM
Z
Zhiwei Xu
Haihe Lab of ITAI, Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
T
Tao Li
College of Computer Science, Nankai University, Tianjin, 300350, China