Leveraging Text-Driven Semantic Variation for Robust OOD Segmentation

📅 2025-11-10
📈 Citations: 0
Influential: 0
📄 PDF

career value

204K/year
🤖 AI Summary
To address the insufficient robustness of out-of-distribution (OOD) road anomaly detection in autonomous driving, this paper proposes a text-driven semantic enhancement framework. We introduce, for the first time, a semantic-distance-based OOD prompting strategy, integrating a vision-language model encoder with a Transformer decoder to enable distance-aware prompt generation and cross-modal alignment for joint image-text representation learning. By systematically leveraging text-guided semantic diversity, our method significantly improves generalization to unseen anomaly categories. Extensive experiments on standard benchmarks—Fishyscapes, SMIYC, and Road Anomaly—demonstrate state-of-the-art performance in both pixel-level and object-level metrics. The proposed approach substantially enhances open-world perception reliability and decision-making safety in complex driving scenarios.

Technology Category

Application Category

📝 Abstract
In autonomous driving and robotics, ensuring road safety and reliable decision-making critically depends on out-of-distribution (OOD) segmentation. While numerous methods have been proposed to detect anomalous objects on the road, leveraging the vision-language space-which provides rich linguistic knowledge-remains an underexplored field. We hypothesize that incorporating these linguistic cues can be especially beneficial in the complex contexts found in real-world autonomous driving scenarios. To this end, we present a novel approach that trains a Text-Driven OOD Segmentation model to learn a semantically diverse set of objects in the vision-language space. Concretely, our approach combines a vision-language model's encoder with a transformer decoder, employs Distance-Based OOD prompts located at varying semantic distances from in-distribution (ID) classes, and utilizes OOD Semantic Augmentation for OOD representations. By aligning visual and textual information, our approach effectively generalizes to unseen objects and provides robust OOD segmentation in diverse driving environments. We conduct extensive experiments on publicly available OOD segmentation datasets such as Fishyscapes, Segment-Me-If-You-Can, and Road Anomaly datasets, demonstrating that our approach achieves state-of-the-art performance across both pixel-level and object-level evaluations. This result underscores the potential of vision-language-based OOD segmentation to bolster the safety and reliability of future autonomous driving systems.
Problem

Research questions and friction points this paper is trying to address.

Detecting anomalous objects in autonomous driving using vision-language models
Improving out-of-distribution segmentation through text-driven semantic variation
Enhancing road safety by generalizing to unseen objects in diverse environments
Innovation

Methods, ideas, or system contributions that make the work stand out.

Text-Driven OOD Segmentation model learns diverse objects
Combines vision-language encoder with transformer decoder
Uses Distance-Based OOD prompts and semantic augmentation
🔎 Similar Papers
No similar papers found.