NRSeg: Noise-Resilient Learning for BEV Semantic Segmentation via Driving World Models

๐Ÿ“… 2025-07-05
๐Ÿ“ˆ Citations: 0
โœจ Influential: 0
๐Ÿ“„ PDF
๐Ÿค– AI Summary
BEV semantic segmentation is critical for end-to-end autonomous driving, yet unsupervised and semi-supervised approaches suffer from limited diversity in labeled data distributions, while synthetic data from driving world models introduce substantial generative noise that degrades model robustness. To address this, we propose NRSeg, a noise-aware BEV segmentation framework featuring: (i) synthetic data quality assessment grounded in multi-view geometric consistency; (ii) dual-path uncertainty modeling jointly leveraging categorical and Dirichlet distributions; and (iii) an integrated evidence deep learning paradigm with a hierarchical local semantic exclusion mechanism to mitigate noise interference and resolve semantic non-exclusivity. Evaluated on unsupervised and semi-supervised BEV segmentation benchmarks, NRSeg achieves new state-of-the-art performance, improving mIoU by 13.8% and 11.4%, respectively.

Technology Category

Application Category

๐Ÿ“ Abstract
Birds' Eye View (BEV) semantic segmentation is an indispensable perception task in end-to-end autonomous driving systems. Unsupervised and semi-supervised learning for BEV tasks, as pivotal for real-world applications, underperform due to the homogeneous distribution of the labeled data. In this work, we explore the potential of synthetic data from driving world models to enhance the diversity of labeled data for robustifying BEV segmentation. Yet, our preliminary findings reveal that generation noise in synthetic data compromises efficient BEV model learning. To fully harness the potential of synthetic data from world models, this paper proposes NRSeg, a noise-resilient learning framework for BEV semantic segmentation. Specifically, a Perspective-Geometry Consistency Metric (PGCM) is proposed to quantitatively evaluate the guidance capability of generated data for model learning. This metric originates from the alignment measure between the perspective road mask of generated data and the mask projected from the BEV labels. Moreover, a Bi-Distribution Parallel Prediction (BiDPP) is designed to enhance the inherent robustness of the model, where the learning process is constrained through parallel prediction of multinomial and Dirichlet distributions. The former efficiently predicts semantic probabilities, whereas the latter adopts evidential deep learning to realize uncertainty quantification. Furthermore, a Hierarchical Local Semantic Exclusion (HLSE) module is designed to address the non-mutual exclusivity inherent in BEV semantic segmentation tasks. Experimental results demonstrate that NRSeg achieves state-of-the-art performance, yielding the highest improvements in mIoU of 13.8% and 11.4% in unsupervised and semi-supervised BEV segmentation tasks, respectively. The source code will be made publicly available at https://github.com/lynn-yu/NRSeg.
Problem

Research questions and friction points this paper is trying to address.

Enhances BEV segmentation using synthetic data diversity
Addresses generation noise in synthetic data for learning
Improves robustness in unsupervised and semi-supervised BEV tasks
Innovation

Methods, ideas, or system contributions that make the work stand out.

Perspective-Geometry Consistency Metric for synthetic data evaluation
Bi-Distribution Parallel Prediction for model robustness
Hierarchical Local Semantic Exclusion for non-mutual exclusivity
๐Ÿ”Ž Similar Papers
No similar papers found.