FishBEV: Distortion-Resilient Bird's Eye View Segmentation with Surround-View Fisheye Cameras

📅 2025-09-17

📈 Citations: 0

✨ Influential: 0

career value

235K/year

🤖 AI Summary

Existing bird’s-eye view (BEV) segmentation methods for fisheye cameras suffer from severe geometric distortion, ambiguous cross-view correspondences, and unstable temporal dynamics—leading to degraded performance. To address these challenges, this paper proposes the first fisheye-specific BEV segmentation framework. Methodologically: (1) distortion-adaptive multi-scale feature extraction mitigates radial distortion effects; (2) uncertainty-aware spatial cross-attention (U-SCA) enhances robustness to cross-view misalignment; and (3) duration-aware temporal self-attention (D-TSA) models non-uniform inter-frame dynamics. The framework adopts DRME as its backbone and integrates the above modules with multi-scale uncertainty estimation. Evaluated on the SynWoodScape dataset, our approach significantly outperforms existing state-of-the-art methods in surround-view fisheye BEV segmentation, achieving superior accuracy and temporal stability.

Technology Category

Application Category

📝 Abstract

As a cornerstone technique for autonomous driving, Bird's Eye View (BEV) segmentation has recently achieved remarkable progress with pinhole cameras. However, it is non-trivial to extend the existing methods to fisheye cameras with severe geometric distortion, ambiguous multi-view correspondences and unstable temporal dynamics, all of which significantly degrade BEV performance. To address these challenges, we propose FishBEV, a novel BEV segmentation framework specifically tailored for fisheye cameras. This framework introduces three complementary innovations, including a Distortion-Resilient Multi-scale Extraction (DRME) backbone that learns robust features under distortion while preserving scale consistency, an Uncertainty-aware Spatial Cross-Attention (U-SCA) mechanism that leverages uncertainty estimation for reliable cross-view alignment, a Distance-aware Temporal Self-Attention (D-TSA) module that adaptively balances near field details and far field context to ensure temporal coherence. Extensive experiments on the Synwoodscapes dataset demonstrate that FishBEV consistently outperforms SOTA baselines, regarding the performance evaluation of FishBEV on the surround-view fisheye BEV segmentation tasks.

Problem

Research questions and friction points this paper is trying to address.

Addresses fisheye camera distortion in BEV segmentation

Solves ambiguous multi-view correspondences in autonomous driving

Ensures temporal coherence with adaptive attention mechanisms

Innovation

Methods, ideas, or system contributions that make the work stand out.

Distortion-Resilient Multi-scale Extraction backbone

Uncertainty-aware Spatial Cross-Attention mechanism

Distance-aware Temporal Self-Attention module

🔎 Similar Papers

No similar papers found.