🤖 AI Summary
In autonomous driving, single-modality anomaly segmentation models suffer from high false-positive rates due to excessive anomaly scores on non-anomalous regions. To address this, we propose MMRAS+, the first multimodal (image + text) uncertainty-aware anomaly segmentation framework tailored for road scenes. Methodologically, MMRAS+ introduces a novel multimodal uncertainty modeling mechanism that integrates the CLIP text encoder to enable cross-modal alignment between visual and semantic textual features while quantifying predictive uncertainty. A lightweight ensemble module further enhances robustness. Evaluated on RoadAnomaly, SMIYC, and Fishyscapes, MMRAS+ significantly outperforms state-of-the-art single-modality methods, effectively suppressing false-positive responses on non-anomalous categories. The source code is publicly available.
📝 Abstract
Semantic segmentation allows autonomous driving cars to understand the surroundings of the vehicle comprehensively. However, it is also crucial for the model to detect obstacles that may jeopardize the safety of autonomous driving systems. Based on our experiments, we find that current uni-modal anomaly segmentation frameworks tend to produce high anomaly scores for non-anomalous regions in images. Motivated by this empirical finding, we develop a multi-modal uncertainty-based anomaly segmentation framework, named MMRAS+, for autonomous driving systems. MMRAS+ effectively reduces the high anomaly outputs of non-anomalous classes by introducing text-modal using the CLIP text encoder. Indeed, MMRAS+ is the first multi-modal anomaly segmentation solution for autonomous driving. Moreover, we develop an ensemble module to further boost the anomaly segmentation performance. Experiments on RoadAnomaly, SMIYC, and Fishyscapes validation datasets demonstrate the superior performance of our method. The code is available in https://github.com/HengGao12/MMRAS_plus.