Language-Guided Open-World Anomaly Segmentation

📅 2025-12-01
📈 Citations: 0
Influential: 0
📄 PDF
🤖 AI Summary
Existing open-world anomaly segmentation methods struggle to semantically identify and name unknown categories, relying on fixed vocabularies that preclude dynamic expansion during inference. To address this, we introduce CLIP into autonomous driving–oriented open-world anomaly segmentation—marking the first such application—and propose a zero-shot, anomaly-data-free framework. Our method leverages CLIP’s joint image-text embedding space to enable cross-modal alignment between image regions and arbitrary textual labels, supporting vocabulary expansion and interpretable semantic labeling of previously unseen anomalies at inference time. By integrating open-vocabulary segmentation, image-text embedding alignment, and dynamic category expansion, our approach achieves state-of-the-art performance on standard anomaly segmentation benchmarks. It offers superior generalization, strong interpretability, and practical deployability.

Technology Category

Application Category

📝 Abstract
Open-world and anomaly segmentation methods seek to enable autonomous driving systems to detect and segment both known and unknown objects in real-world scenes. However, existing methods do not assign semantically meaningful labels to unknown regions, and distinguishing and learning representations for unknown classes remains difficult. While open-vocabulary segmentation methods show promise in generalizing to novel classes, they require a fixed inference vocabulary and thus cannot be directly applied to anomaly segmentation where unknown classes are unconstrained. We propose Clipomaly, the first CLIP-based open-world and anomaly segmentation method for autonomous driving. Our zero-shot approach requires no anomaly-specific training data and leverages CLIP's shared image-text embedding space to both segment unknown objects and assign human-interpretable names to them. Unlike open-vocabulary methods, our model dynamically extends its vocabulary at inference time without retraining, enabling robust detection and naming of anomalies beyond common class definitions such as those in Cityscapes. Clipomaly achieves state-of-the-art performance on established anomaly segmentation benchmarks while providing interpretability and flexibility essential for practical deployment.
Problem

Research questions and friction points this paper is trying to address.

Segment unknown objects in autonomous driving scenes
Assign meaningful labels to anomalies without retraining
Detect and name unconstrained classes beyond fixed vocabularies
Innovation

Methods, ideas, or system contributions that make the work stand out.

CLIP-based zero-shot open-world anomaly segmentation
Dynamic vocabulary extension without retraining at inference
Leverages image-text embedding for interpretable unknown object naming
🔎 Similar Papers
No similar papers found.